Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Spiking Neural Networks (SNNs) are considered as the third generation of Artificial Neural Networks (ANNs) [1]. These networks are formed by spiking neurons; which deal with information encoded in timing of events (spikes), instead by the spike rate as their predecessors do. It have been proved that spiking neural networks are computationally more stronger than sigmoid neural networks [2]. In fact, there is evidence that fewer spiking neurons are required for solving some functions than neurons of previous generations [1].

As their predecessors, SNNs have been applied for dealing with pattern recognition problems. Spikeprop algorithm is a learning rule for training SNNs, which is based on gradient descent [3]; it is capable of training SNNs to solve complex classification problems. However, this algorithm trends to be trapped in local minima and it has several limitations i.e. it does not allow both, positive and negative synaptic weights. The following list explains some drawbacks about using Spikeprop on the supervised training of SNNs [4]:

  • The convergence of the algorithm is vulnerable to being caught in local minima.

  • The convergence of the algorithm is not guaranteed as it depends on fine tuning of several parameters before start of the algorithm.

  • The structure of the synapse which consists of a fixed number of sub-connections, each of which has a fixed synaptic delay, leads to a drastic increase in the number of connecting synapses and therefore the number of adjustable weights.

  • The huge number of synaptic connections makes it difficult for such algorithms to scale up when processing high-dimensional dataset is considered.

  • Also having a fixed number of sub-connections with fixed delay vales is not necessity and yields a lot of redundant connections.

  • The algorithm entails the problem of ‘silent neurons’; i.e. if the outputs do not fire then the algorithm halts, since no error can be calculated.

For overcoming the drawbacks of Spikeprop on the supervised learning of SNNs, some works have proposed to use metaheuristic for tuning the neural parameters (weights and delays) [47]. Two similar proposals of the state of the art for training SNNs by using metaheuristics are compared in this work. Both methodologies are tested with classical test benchmarks: XOR Logic Gate and Iris plant dataset (UCI Machine Learning Repository), with different architectures over several experiments.

This chapter is organized as follows: Sect. 2 gives fundamentals for simulating SNNs, including neural network structure, codifier and spiking neuron model. In Sect. 3, the methodology used for training SNNs is explained. The experimental design and results are showed in Sect. 4. Finally, in Sect. 5 conclusions about the work and future work is presented.

2 Spiking Neural Networks

A neural network can be defined as an interconnection of neurons, such that neuron outputs are connected, through weights, to all other neurons including themselves; both lag-free and delay connections are allowed [8]. There are several models or topologies of ANNs, which are defined around three aspects: computing nodes, communication links and message types [9].

For this work were used feed-forward SNNs. They were formed by 3 layers: input, hidden and output (see Fig. 1), their topology was set as follows: the computing nodes (neuron models) are spiking neurons, specifically neurons of the Spike Response Model (SRM) [10]; this model is explained later. The communication links (synaptic connections, each synaptic connection is defined by a synaptic weight and a synaptic delay) were set to form a fully interlayer connected neural network; where all possible interlayer connections are present and the network contains no intralayer (including self-connections), or supralayer connections [11], this kind of pattern connection is also know it as fully connected neural network. The message types are based on the time-to-first-spike encoding scheme; it means that each neuron can produce at most one spike in all the simulation time for each stimuli.

Fig. 1
figure 1

Top Spiking neural network with three layers: input, hidden and output. Bottom Synapse connection (weights and delays) between presynaptic neurons and the postsynaptic neuron j (figure taken from [6])

2.1 Gaussian Receptive Fields

The traditional form of patterns as multidimensional raw data, which consist of real values can’t be used to feed a SNNs in a simulation process. The patterns need to be transformed into temporal patterns (a set of events in time or spikes as known as spike trains) before being processed by the SNN.

In [12], was proposed an encoding scheme to generate firing times from real values; the Gaussian Receptive Fields (GRFs). This scheme enables the representation of continuously valued input variables by a population of neurons with graded and overlapping sensitivity profiles, such as Gaussian activation functions (the receptive fields). To encode values into a temporal pattern, it is sufficient to associate highly stimulated neurons with early firing times and less stimulated neurons with later (or no) firing times. Each input variable is encoded independently, it means that each input dimension is encoded by an array of one-dimensional receptive fields; the GRFs constitute a biologically plausible and well studied method for representing real-valued parameters [3].

In [13] is given a detailed definition about the construction and use of the GRFs. Each input datum is fed to all the conversion neurons covering the whole data range. For a range \( \left[ {n_{min} ,\;n_{max} } \right] \) of a variable n, m neurons were used with GRF. For the i-th neuron coding for variable n, the center of the Gaussian function is set to \( C_{i} \) according to Eq. (1).

$$ C_{i} = n_{min} + \frac{2i - 3}{w} \times \frac{{n_{max} - n_{min} }}{m - 2}, \quad m > 2 $$
(1)

The width w of each Gaussian function is set according to Eq. 2. Where \( 1 \le \gamma \le 2 \) is a coefficient of multiplication which is inversely proportional to the width of the Gaussian functions.

$$ w = \frac{{n_{max} - n_{min} }}{{\gamma \left( {m - 2} \right)}} $$
(2)

Using the centers of each neuron/Gaussian function and their width, the magnitude of firing f(x) for each sample points of the input x is calculated using Eq. 3.

$$ f_{i} \left( x \right) = e^{{ - \frac{{\left( {x - c_{i} } \right)^{2} }}{{2w^{2} }}}} $$
(3)

The magnitude values are then converted to time delay values by associating the highest magnitude value to a value close to zero milliseconds and neurons with lower firing magnitude are associated with a time delay value close to the limit maximum of time of simulation. Conversion from magnitude to time delay values is done using Eq. (4). Where τ is the total time of simulation.

$$ T_{i} = \left( {1 - f_{i} } \right) \times \tau , \quad i = 1,2, \ldots ,m $$
(4)

Time delay values greater than some value \( \tau_{threshold} \,<\, \tau \) are coded no to fire as they are consider being insufficiently excited. Therefore, neurons with time delay value between zero and \( \tau_{threshold} \) milliseconds carry all the encoded information of the input data.

2.2 Spike Response Model

The SRM [10, 14] is an approximation of the dynamics of the integrate-and-fire neurclose to the limit maximum of time of simulation.ons. The neuron status is updated through a linear summation of the postsynaptic potentials resulting from the impinging spike trains at the connecting synapses. A neuron fires whenever its accumulated potential reaches a threshold from below (see Fig. 2) [6].

Fig. 2
figure 2

Incoming spikes transformed into postsynaptic potentials, they are delayed and weighted by the synapse and finally they are integrated by the postsynaptic neuron (figure taken from [5])

Due to spiking neurons in this work use the time-to-first-spike encoding scheme for sending/receiving messages, a reduced version of the SRM is implemented; this has been used in [37, 12].

The reduced SRM is defined according [6] as follows. Let us consider that a neuron j has a set \( \varGamma_{j} \) of immediate predecessors called presynaptic neurons and receives a set of spikes with firing times \( t_{i} ,i \in \varGamma_{j} \). Neurons fire when their state variable x(t), called membrane potential, reaches a certain threshold Θ. The internal state of a neurons is determined by (5), where \( w_{ji} \) is the synaptic weight to modulate \( y_{i} (t) \), which is the unweighted postsynaptic potential of a single spike coming from neuron i and impinging on neuron j.

$$ x_{j} \left( t \right) = \mathop \sum \limits_{{i \in \varGamma_{j} }} w_{ji} y_{i} (t) $$
(5)

The unweighted contribution \( y_{i} \left( t \right) \) is given by Eq. (6), where ε(t) is a function that describes the form of the postsynaptic potential.

$$ y_{i} \left( t \right) = \varepsilon \left( {t - t_{i} - d_{ji} } \right) $$
(6)

The form of the postsynaptic potential is given by Eq. (7), and it requires the next parameters: \( t \) is the current time, \( t_{i} \) is the firing time of the presynaptic neuron i and \( d_{ji} \) is the associated synaptic delay. Finally the function has a τ parameter, it is the membrane potential time constant defining the decay time of the postsynaptic potential.

$$ \varepsilon \left( t \right) = \left\{ {\begin{array}{*{20}c} {\frac{t}{\tau }e^{{1 - \frac{t}{\tau }}} } \hfill & {if\, f > 0} \hfill \\ 0 \hfill & {else} \hfill \\ \end{array} } \right. $$
(7)

The firing time \( t_{j} \) of neuron j is determined as the first time the state variable crosses the threshold from below. The threshold Θ and τ are constants and equal for all neurons in the network.

3 Metaheuristic-Based Supervised Learning

Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place [15]. In this case, the learning is driven by some metaheuristic algorithm (see Fig. 3 the free parameters of the SNN are all weights and delays of each synapse.

Fig. 3
figure 3

Generic scheme for training SNNs with metaheuristic algorithms

Several works have used metaheuristics for training SNN [47]; each of them defines a particular methodology, next is presented the methodology followed based on similar aspects of the works revised for this work.

In Metaheuristic-Based Supervised Learning, each individual contains all the free parameters of a previously structured SNN. Every individual is evaluated by means of a fitness function, the first step of the fitness function makes a mapping process; this sets the individuals parameter as weights and delays in the SNN. The second step of the fitness function uses the batch training as learning protocol, where all patterns are presented to the network before the learning takes place [16]. The third step of the fitness function is calculated an error (to be minimized) according Eq. (8) (equation taken from [4]); where T are all training patterns, O are all output spiking neurons, \( t_{O}^{a} (t) \) is the current timing output of the SNN and \( t_{O}^{t} (t) \) is the desired timing output. The error calculated in the fitness function determines the fitness value of each individual and drives the supervised learning based on metaheuristic algorithms.

$$ E = \mathop \sum \limits_{t}^{T} \mathop \sum \limits_{o}^{O} \left( {t_{0}^{a} \left( t \right) - t_{o}^{t} (t)} \right)^{2} $$
(8)

Next are presented two metaheuristic used for training SNNs in the state of the art.

3.1 Evolutionary Strategy

The Evolutionary Strategies (ES) [17], deal natively with problems in real domain. In [4] was designed a Self-Adaptive ES for training SNNs, in this ES each population member consists of n-dimensional vectors, where n is the total number of tuneable network parameters within input, hidden and output layer. The population at any given generation g is denoted as P(g). Each individual is taken as a pair of real-valued vectors, \( \left( {x_{i} ,\eta_{i} } \right) \), where \( x_{i} \)’s are objective variables representing the synaptic free parameters, and \( \eta_{i} \)’s are standard deviations for mutations. Each individual generates a single offspring \( \left( {{x^{\prime}}_{i} ,{\eta^{\prime}}_{i} } \right) \), where each variable \( {x^{\prime}}_{i} \left( j \right) \) of the offspring can be defined by either Eq. 9 (local search) or Eq. 10 (global search) and the standard deviation is defined by Eq. (11).

$$ {x^{\prime}}_{i} \left( j \right) = x_{i} \left( j \right) + \eta_{i} \left( j \right)N_{j} (0,1) $$
(9)
$$ {x^{\prime}}_{i} \left( j \right) = x_{i} \left( j \right) + \eta_{i} \left( j \right)\delta_{j} $$
(10)
$$ \eta^{\prime}_{i} \left( j \right) = \eta_{i} \left( j \right)\exp \left( {\tau^{\prime} N(0,1) + \tau N_{j} (0,1)} \right) $$
(11)

where:

  • \( N(0,1) \) denotes a normally distributed one dimensional random number with μ = 0 and σ = 1.

  • \( N_{j} (0,1) \) indicates that the random number is generated anew for each value of j.

  • \( \delta_{j} \) is a Cauchy random variable, and it is generated anew for each value of j (Scale = 1).

  • Factor \( \tau = \frac{1}{{\sqrt {2\sqrt n } }} \)

  • Factor \( \tau^{\prime} = \frac{1}{{\sqrt {2n} }} \)

The Self-adaptive ES is presented in the Algorithm 1.

Algorithm 1 Self-adaptive ES 1

1:

Generate the initial population of μ individuals.

2:

Evaluate the fitness score for each individual (x i , η i ), \( i = 1, \ldots ,\mu \) of the population based on \( E = \sum\nolimits_{t}^{T} {\sum\nolimits_{o}^{O} {(t_{o}^{a} (t) - t_{o}^{t} (t))^{2} } } \)

3:

while the maximum iteration is not reached do

4:

Each parent (x i , η i ) generates a single offspring (\( x_{i}^{{\prime }} ,\eta_{i}^{{\prime }} \))

5:

Calculate the fitness of each offspring \( (x_{i}^{{\prime }} ,\eta_{i}^{{\prime }} ),\;i = 1, \ldots ,\mu \)

6:

Generate a new population P(g) using tournament selection and elitism to keep track of the best individual at each generation

7:

end while

3.2 Particle Swarm Optimization

The Particle Swarm Optimization (PSO) [18] is an algorithm originally used for continuous optimization problems, it is inspired in the behavior of flocks of birds or schools of fish. A version of PSO called Cooperative PSO (CPSO) has been used in [7] for training SNNs. In this metaheuristic, each individual is defined by the vector velocity \( v_{i} \) and the vector position \( x_{i} \).

The vector \( v_{i} \) is updated by Eq. (12) where \( \rho_{1} \) and \( \rho_{2} \) are acceleration coefficients. χ is a constriction factor, it can be defined according Eq. (14). \( \gamma_{1} \) and \( \gamma_{2} \) are two independent uniform numbers randomly distributed in the range [0,1]. And ω is the inertia weight, which controls the impact of the previous velocity; it is defined by Eq. (13) where \( \omega_{0} \) is the maximum of \( \omega \), \( \omega_{0} - \omega_{1} \) is the minimum of ω, t is the current iteration and \( T_{max} \) is the maximal iteration of PSO.

The vector \( x_{i} \) is updated by Eq. (12).

$$ v_{id} = \chi \left( {\omega v_{id} + \rho_{1} \gamma_{1} \left( {p_{id} - x_{id} } \right)} \right) $$
(12)
$$ \omega = \omega_{0} - \omega_{1} \frac{t}{{T_{max} }} $$
(13)
$$ \chi = \frac{2}{{\left| {2 - \varphi - \sqrt {\varphi^{2} - 4\varphi } } \right|}};\varphi = \rho_{1} + \rho_{2} , \varphi > 4 $$
(14)
$$ x_{id} = x_{id} + v_{id} $$
(15)

The CPSO is presented in the Algorithm 2.

Algorithm 2 CPSO-based supervised learning

1:

Generate the initial swarm.

2:

for all particle i do

3:

\( x_{id} \leftarrow rand\left( {x^{min} ,x^{max} } \right) \)

4:

\( v_{id} \leftarrow rand\left( {v^{min} ,v^{max} } \right) \)

5:

Its past best solution can be set to the same as x i initially.

6:

end for

7:

while the maximum iteration is not reached do

8:

Compute the fitness of each particle for its current position

9:

Compute for each input pattern by Spiking neural network, and obtain the spike times of the output neurons

10:

Calculate the square error usign \( E = \sum\nolimits_{t}^{T} {\sum\nolimits_{o}^{O} {(t_{o}^{a} (t) - t_{o}^{t} (t))^{2} } } \) as the fitness of the particle

11:

Find the global best particle g with the minimal fitness from the swarm of the current iteration

12:

\( v_{id} \, \leftarrow \chi \left( {\omega v_{id} \, + \, p_{\text{l}} \gamma_{1} \left( {p_{id} - x_{id} } \right) + p_{ 2} \gamma_{ 2} \left( {p_{gd} - x_{id} } \right)} \right) \)

13:

\( x_{id} \, \leftarrow x_{id} \, + \, v_{id} \)

14:

end while

4 Experiments and Results

Classical benchmarks of pattern recognition were used for experimentation: XOR Logic Gate and Iris Plant dataset. For each benchmark, 5 SNNs with different configurations on their layers were used; each of them was trained (using both metaheuristics) 35 times. For each benchmark, the training by using ES and CPSO of each SNN configuration is contrasted using the fitness values by calculating the minimum, the maximum, median, mean and standard deviation.

The common parameters of the metaheuristics through all experiments and trainings are: 30 individuals and 15,000 function calls.

4.1 XOR Logic Gate

For the XOR logic gate, it was not necessary to use GRFs. Inputs values were manually encoded, the encoded inputs and the desired timing output are showed in Table 1.

Table 1 Configuration used for solving XOR

The values of CPSO parameters were empirically set, they were set as \( \rho_{1} = 0.8 \) and \( \rho_{2} = 3.3 \). The results for the training of each configuration are showed in Table 2. For this benchmark the testing phase was not necessary, due that any fitness value greater than 0 produces a poor performance of the SNN. Taking the median value as representative error of training for each metaheuristic over all configurations, can be inferred that ES outperforms to CPSO for the training phase in this problem.

Table 2 Results of fitness values for training SNNs by using ES and CPSO to solve XOR Logic Gate

4.2 Iris Plant Dataset

The Iris plant dataset contains 3 classes of which 2 are not linearly separable, each class is formed by 50 patterns each of them described by 4 features. For this benchmark, each feature was encoded by GRFs using 4 encoding. The desired timing outputs for setosa, versicolor and virginica classes are respectively 6,10 and 14 ms. The dataset was divided into 2 equal parts (2-Folds Cross Validation). Then 2 tests were performed.

The values of CPSO parameters were empirically set, they were set as \( \rho_{1} = 2.1 \) and \( \rho_{2} = 2.1 \). For the first test instance Tables 3 and 4 show the optimization phase and classification phase respectively. And for the second test instance Tables 5 and 6 show the optimization phase and classification phase respectively. The classification phase was performed by using the configuration given by the minimum and maximum fitness values on the training phase, the best performances correspond to trainings with low fitness and worst performances correspond to trainings with high fitness. Same as the XOR logic gate, taking the median value as representative error of training for each metaheuristic over all configurations, can be inferred that ES outperforms to CPSO for the training phase in this problem. In general the best performances on classification tasks, for both: known patterns and unknown patters, can be observed in the trainings made by ES.

Table 3 Results of fitness values for training SNNs by using ES and CPSO to solve Iris plant dataset (1st fold)
Table 4 Results of performance of SNN trained with ES and CPSO (1st fold)
Table 5 Results of fitness values for training SNNs by using ES and CPSO to solve Iris plant dataset (2nd fold)
Table 6 Results of performance of SNN trained with ES and CPSO (2nd fold)

5 Conclusions and Future Work

This work compares two metaheuristics on the training of SNNs. The few ES’s parameters made easier to implement. It as training algorithm than CPSO. Moreover, in general ES gets better fitness values than CPSO and the SNNs trained with ES get better classification performance too.

The results obtained show evidence that there is a relationship between the neural architecture and the capability of the metaheuristic to train it. Due that good results in phases of training and testing can be achieved by different architectures, but some neural configurations are easier to train and show more stability through several experiments. Even when a metaheuristic can achieved low fitness values, it doesn’t ensure the good performance of the SNN for classifying unseen data; this is an important aspect that needs to be analyzed when using metaheuristic-based supervised learning on SNNs.

As future work, authors propose try to improve the CPSO performance on the training of SNNs by tuning its parameters. It is interesting to use other metaheuristic algorithms as learning rules to analyze their behavior. Finally it is necessary to use statistical tests for a best comparison of these algorithms in this task.