Keywords

1 Introduction

Research evidence shows that the precise spatio-temporal firing pattern of groups of neurons can convey relevant information [1], which enables us to use time as a communication and computing resource in spiking neural networks (SNNs). In recent years, learning methods focusing on how to deal with spatio-temporal spikes by a supervised way have also been explored [2]. This scheme can train single or multilayer networks to fire the required output spike train. For single-layer networks, different spike-timing based learning rules have been developed [3, 4]. Theses rules adopt either an error function minimized by gradient descent or an analog of the Widrow-Hoff (WH) rule. Remote supervised method (ReSuMe) [5] is an outstanding method due to its effectiveness. It uses Spike-Timing-Dependent Plasticity (STDP) and anti-STDP window to finish the learning process. All of these existing single-layer algorithms can successfully finish the training, while the efficiency of them is low, especially for complex tasks. Therefore, training a hierarchical SNN in the closest way to the brain is required.

To further improve the learning performance, the Quick Propagation, Resilient Propagation [6] and the SpikeProp [7] are studied. However, due to the sudden jump or discontinuity of error function, the gradient learning method may lead to learning failure. Another thread of research is to use the revised version of the WH learning rule for SNNs. ReSuMe is extended to the Multilayer Remote Supervised Learning Method (Multi-ReSuMe) in [8], where multiple pulses are considered in each layer. The delay of spike propagation is a vital feature in the real biological nervous system [9]. Combining ReSuMe and delay learning, [10] further puts forward a new algorithm for multiple neurons. Although many efforts have been made for SNN structure design and learning, in most of the existing learning methods, the transformation of relevant information is realized by using rate coding or single spike of neurons [11] due to the discontinuous nature of neuronal spike timing. Thus, it remains as one of the challenging problems to build an SNN that can learn such spike pattern-to-spike pattern transformations.

In this paper, a novel supervised learning method is presented, which trains the multilayer SNN for transmitting spatio-temporal spike patterns. In this work, the error function from Widrow Hoff (WH) rule, based on the difference between the actual and expected output spike trains, is first introduced to change the synaptic weights, and then is applied to neurons triggering multispike in each layer through a backpropagation learning rule. The main innovations of this method consist in: 1) extending the WH rule-based PSD rule to learn spatio-temporal spike patterns in multilayer SNNs and 2) effectively reducing the number of connections, thus improving the computational efficiency of the network. Finally, our method is evaluated thoroughly on the benchmark datasets. Experimental results show this algorithm can achieve high learning accuracy and have a significant improvement in the computational efficiency of the network.

Fig. 1.
figure 1

Multilayer network structure. Input neurons are connected to the output neuron by hidden neurons. \(w_{o h}\) represents the weight between an output neuron o and hidden neuron h. \(w_{h i}\) represents the weight between an hidden neuron h and input neuron i.

2 Neuron Model

Firstly, we define a spike train as a series of impulses triggered by a specific neuron at its firing time, which is given as the following form: \(S(t) = {\sum }_f \delta (t-t^f)\), where \(t^f\) is the f-th firing time, and \(\delta (x)\) is the Dirac function: \(\delta (x) = 1\) (if x = 0) or 0 (otherwise). Then a linear stochastic neuron model is introduced in continuous time for constructing a relation between the input and output impulse trains as used in [8]. The instantaneous firing rate R(t) of a postsynaptic neuron i is the probability density of firing at time t and is determined by the instantaneous firing rates of its presynaptic neurons j: \(R_i(t) = \frac{1}{k}{\sum }_j w_{ij}R_j(t)\) where k is the number of presynaptic neurons. In a single calculation, we only get a concrete spike train S(t) instead of a direct R(t) of the neuron. However, R(t) can be defined as the expectation over S(t) for an infinite number of trials.

$$\begin{aligned} R(t)=\langle S(t)\rangle =\lim _{M \rightarrow \infty } \frac{1}{M} \sum _{k=1}^{M} S_{k}(t) \end{aligned}$$
(1)

where M is the number of trials, and \(S_k(t)\) is the concrete spike train for each trial. In this paper, we use R(t) to derive the learning method because of its smoothness. In a single run, R(t) will be replaced by S(t) at a suitable point.

For simplicity, the leaky integrate-and-fire (LIF) model [12] is considered. For a postsynaptic neuron, the input synaptic current is calculated as:

$$\begin{aligned} I_{syn} = \sum _{t} {w_i}{I_{PSC}^i}(t) \end{aligned}$$
(2)

where \(w_i\) is the synaptic efficacy of the i-th afferent neuron, and \(I_{PSC}^i\) is the un-weighted postsynaptic current (PSC) from the corresponding afferent.

$$\begin{aligned} I_{PSC}^i (t) = \sum _{t^f} K(t-t^j)H(t-t^j) \end{aligned}$$
(3)

where \(t^j\) is the time of the j-th impulse triggered from the i-th afferent neuron, H(t) represents the Heaviside function, and K is the normalized kernel: \(K(t-t^j)=V_0 \cdot (exp(\frac{-(t-t^j)}{\tau _m})-exp(\frac{-(t-t^j)}{\tau _s}))\). \(V_0\) is the normalization factor. \(\tau _m\) and \(\tau _s\) are slow and fast decay constants, respectively. Their proportion is fixed as \(\tau _m/\tau _s=4\). When \(V_{m}\) crosses the firing threshold \(\vartheta \), the neuron will emit an output spike, and the membrane potential is reset to \(V_\text {r}\).

2.1 Learning Algorithm

The instantaneous training error is computed according to the difference between the actual instantaneous triggering rate \(R_{o}^{a}(t)\) and the desired instantaneous triggering rate \(R_{o}^{d}(t)\):

$$\begin{aligned} E(t)=E\left( R_{o}^{a}(t)\right) =\frac{1}{2} \sum _{o \in O}\left[ R_{o}^{a}(t)-R_{o}^{d}(t)\right] ^{2} \end{aligned}$$
(4)

Our goal is to minimize the network error in triggering a required output spike pattern through gradient ascent with respect to synaptic weights,

$$\begin{aligned} \varDelta w_{o h}(t)=-\eta \frac{\partial E\left( R_{o}^{a}(t)\right) }{\partial w_{o h}} \end{aligned}$$
(5)

where \(\eta \) is the learning rate. The derivative of the error function can be further expanded by introducing the chain rule. Since R(t) can be replaced at a suitable point by an estimate for a single run S(t), the weights is updated according to

$$\begin{aligned} \varDelta w_{o h}(t)=\frac{1}{n_{h}}\left[ S_{o}^{d}(t)-S_{o}^{a}(t)\right] S_{h}(t) \end{aligned}$$
(6)

Following PSD learning rule derived by the WH rule, we replace the nonlinear product by the spike convolution method, \(\tilde{s}_{h}(t)=s_{h}(t) * K(t)\). Hence,

$$\begin{aligned} \frac{d \omega _{o h}(t)}{d t}=\eta \left[ s_{o}^{d}(t)-s_{o}^{a}(t)\right] \left[ s_{h}(t) * K(t)\right] \end{aligned}$$
(7)

In PSD, weight adaptation only relies on the current states, which is different from the rules involving STDP, where both the pre- and post-synaptic spiking times are stored and used for adaptation [13]. By combining Eq. 7, we finally get the total weight update:

$$\begin{aligned} \begin{aligned} \varDelta \omega _{o h}&= \eta \int _{0}^{T} \varDelta w_{o h}(t) dt\\&=\eta [\sum _{m} \sum _{f} K\left( t_{d}^{m}-t_{h}^{f}\right) H\left( t_{d}^{m}-t_{h}^{f}\right) \\&-\sum _{n} \sum _{f} K\left( t_{a}^{n}-t_{h}^{f}\right) H\left( t_{a}^{n}-t_{h}^{f}\right) ] \end{aligned} \end{aligned}$$
(8)

The weight modifications for hidden layer neurons are computed similarly:

$$\begin{aligned} \begin{aligned} \varDelta w_{h i}(t)&=-\frac{\partial E\left( R_{o}^{a}(t)\right) }{\partial w_{h i}}\\&= -\frac{\partial E\left( R_{o}^{a}(t)\right) }{\partial R_{h}(t)} \frac{\partial R_{h}(t)}{\partial w_{h i}} \end{aligned} \end{aligned}$$
(9)

The weight modification formula of hidden neurons becomes

$$\begin{aligned} \varDelta w_{h i}(t)=\frac{1}{n_{h} n_{i}} \sum _{o \in O}\left[ S_{o}^{d}(t)-S_{o}^{a}(t)\right] S_{i}(t) w_{o h} \end{aligned}$$
(10)

To modify synaptic weights in the same gradient direction, we use the modulus \(|w_{o h}|\) as mentioned in [8]:

$$\begin{aligned} \varDelta w_{h i}(t)=\frac{1}{n_{h} n_{i}} \sum _{o \in O}\left[ S_{o}^{d}(t)-S_{o}^{a}(t)\right] S_{i}(t) |w_{o h}| \end{aligned}$$
(11)

The total weights for the hidden neurons are changed

$$\begin{aligned} \begin{aligned} \varDelta \omega _{h i}&=\eta [\sum _{o \in O} \sum _{m} \sum _{f} K\left( t_{d}^{m}-t_{i}^{f}\right) H\left( t_{d}^{m}-t_{i}^{f}\right) \\&-\sum _{o \in O} \sum _{n} \sum _{f} K\left( t_{a}^{n}-t_{i}^{f}\right) H\left( t_{a}^{n}-t_{i}^{f}\right) ] |w_{o h}| \end{aligned} \end{aligned}$$
(12)

The weights further are changed by synaptic scaling [8],

$$\begin{aligned} w_{i j}=\left\{ \begin{array}{c}{(1+f) w_{i j},~w_{i j}>0} \\ {(\frac{1}{1+f}) w_{i j},~w_{i j}<0}\end{array}\right. \end{aligned}$$
(13)

where f is the scaling factor. We set \(f > 0\) when the firing rate \(r < r_{min}\), and \(f < 0\) for \(r > r_{max}\). The sensitivity of the network to its initial state is reduced by keeping the postsynaptic neuron firing rate within a particular range \([r_{min},r_{max}]\). We introduce the van Rossum metric [13] with a filter function to measure the distance between two spike trains, written as:

$$\begin{aligned} Dist=\frac{1}{\tau } \int _{0}^{\infty }[f(t)-g(t)]^{2} dt \end{aligned}$$
(14)

f(t) and g(t) are filtered signals of the two pulse trains. \(\tau \) is the free parameter.

Fig. 2.
figure 2

The training process of spike sequences is illustrated. The output neuron is trained to reproduce pulses at the target time (light red bars at the bottom). The input spike pattern is displayed at the top. The middle and bottom show the actual triggering spikes for hidden and output neurons after learning, respectively (given by the blue dots). The right figure displays the distance between the target spike sequence and the actual output spike sequence. (Color figure online)

3 Simulations

3.1 Learning Sequences of Spikes

There are N input neurons, and each input neuron sends out a random pulse train which has a uniform distribution over a time interval T. The hidden layer contains H neurons, and the output layer contains only M neurons. The default parameters used in the following experiments are set to \(N = 100\), \(H = 200\), \(M = 1\) and \(T = 0.2\) s. The time step is set as \(dt = 0.01\). The initial synaptic weight can be randomly selected from a Gaussian distribution of mean value 0 and standard deviation 0.1. The spike threshold \(\vartheta = 1\), and the reset potential is 0. The refectory time is set to \(t_{ref} = 0\). We set the parameters \(\eta = 0.01\), \(\tau _m = 10\) ms, and \(\tau _s = 2.5\) ms. The target spike sequence is specified as [40, 80, 120, 160] ms. For each run, the training process is performed for up to 500 epochs or until the distance equals 0. 20 independent runs is repeated for averaging our experimental results. Figure 2 shows the learning process. During the time window T, we use van Rossum Dist to present the training error. Initially, the neuron can trigger a spike at any arbitrary time, which causes a large distance value. During the training phase, the neuron gradually is trained to fire spikes at the desired time, which is represented by the decrease of distance. After the last 76 learning epochs, the firing time of the output spikes matches the target spikes, and the error function value is reduced to 0. This experiment shows our method can successfully allow the neuron to fire a target pulse sequence within several training epochs.

3.2 Classification on the UCI Dataset

Iris. A basic benchmark dataset for plant classification. It contains 3 types of iris plants. Each category contains 50 samples and each of which is represented by 4 variables. There are 150 instances. 50% samples are chosen from each class to build the training set, and the rest for testing. We use population coding, as described in [14], to convert the Iris data features into spike times. As a result, each feature value is encoded by 6 identically shaped overlapping Gaussian functions, then 4 \(\times \) 6 = 24 input spikes are obtained as the input of 24 synapses. In addition, all patterns have 5 additional input synapses with input spikes at fixed times [2, 4, 6.5, 7.5, 10] to ensure that the target spikes can be launched. The total number of input neurons is 4 \(\times \) 6 + 5 = 29. There are 50 hidden neurons and 3 output neurons. The total time duration for the input pattern is set to \(T = 10\) ms. The network is trained to trigger a desired train of [6.5, 7.5] ms corresponding to the correct input category, and to keep silent for other categories.

Table 1. Comparison with other methods on the Iris dataset

Our approach can achieve comparable or even higher accuracy (reported 96 ± 1.3% accuracy) compared with the traditional neural network [15, 16] in Table 1. This result shows our method is successful in training temporal SNNs. Our method is compared with other spike-based methods. In Table 1, SpikeProp [16], Xie et al. [18, 19], BP-STDP [20] and the proposed method achieve a similar high accuracy on the Iris dataset. However, compared with SpikeProp [16] which requires 1000 convergent epochs, the proposed method only needs 120 convergent epochs. Although Xie et al. [18, 19] have improved the training efficiency and reduced the training epochs from 18 to 2 training epochs, their method is not a real multilayer SNN, where only synaptic weights from input to hidden neurons are adjusted, whereas all synaptic weights from hidden to output neuron are set to 1. For BP-STDP and Multi-ReSuMe, about 75% of the total Iris dataset for each class is used as the training set, but we only use 50% of the total Iris dataset for training and the classification performance can be significantly improved in the testing set. In addition, different from Taherkhani et al. [8, 10, 16], the proposed method does not need sub-connections and thus reduces the number of weight modification.

4 Conclusion

This paper proposes a novel supervised, multispike learning algorithm for multilayer SNNs, which can trigger multiple spikes at precise desired times for each layer. The proposed method derives weight update rule from the WH rule, and then credits the network error simultaneously to previous layers by using backpropagation. Experimental results show our method achieve high learning accuracy with a significant improvement in computational efficiency of the network.