1 Introduction

Sigmoidal neural network, referred to as the second generation of Artificial Neural Network (ANN) [1], is enjoying a great time in the field of computational intelligence, especially in the domain of classification, due to the capability of handling non-linear data and well known learning algorithm back-propagation. Moreover, with the flow of time, other algorithms based on metaheuristic approach is developed in the field of ANN to improve the optimisation capability of synaptic weights such as training ANN using the optimisation method based on Asexual Reproduction Optimization (ACO) [2] where ACO is applied initially to accomplish global searching of optimum synaptic weights and then back-propagation is applied to reduce overall training error, training ANN using the Butterfly Optimization Algorithm (BOA) [3] where BOA is efficiently applied to speed up the convergence rate and to minimise the risk of stagnating into local minimum. In addition, there are several other evolutionary algorithms used to efficiently train ANN for classification problems including some of neuroevolution based method for specific robotic task, in [4], a review of various such algorithms are discussed.

However, the traditional rate coding concept of sharing information among neurons in ANN through synapses is proven unlikely in neuroscience [5, 6] and because of this reason ANN fails to correctly mimic its role model i.e., the human brain. In addition, in order to solve non-linear classification problem, sigmoidal neural network uses hidden layer(s) to separate the hyperplane which is a threat to the computational cost in case of large datasets, because there is no generalized protocol of selecting the optimal number of hidden layers as well as the number of hidden neurons residing to the hidden layers. Unlike ANN, Spiking Neural Network (SNN), the third generation neural network [1] fundamentally works in a different manner which focuses more on the biological plausibility, energy efficiency and computational cost. SNN works more similar manner as the human brain works than that of its predecessor ANN. Spiking neurons in SNN are computationally powerful (use precise temporal information) [7], energy efficient and biologically plausible [8]. Spiking neurons are powerful since only using a single neuron it is possible to separate the hyperplane for solving non-linear classification problems.

Although neuroscience claims that the brain uses temporal coding, it is not very clear about the precise mechanism of coding [9, 10], but most popularly population coding is used by the researchers to carry out experiments with spiking neurons [11]. Biological plausibility of the spiking neuron lies on the firing behaviour, synaptic connections, and learning behaviour. Spiking neuron focuses on the potential difference between internal and external cell membrane akin to the biological neuron and when this internal state or membrane potential crosses a certain threshold the neuron fires a spike. The synapses are fully connected between each and every pre-synaptic (information sender) neuron and post-synaptic (information receiver) neuron. The value of Post-Synaptic Potential (PSP) of a post-synaptic neuron depends on the pre-synaptic spike times, synaptic weights, and synaptic delays. Although, there are several spiking neurons exist such as Spike Response Model (SRM) [12, 13], Hodgkin-Huxley [14], Leaky-Integrate-and-Fire (LIF) [15, 16] which is an improvement over the Integrate-and-Fire neuron [17], is computationally more simpler than others. Note that, LIF neuron can also be converted to SRM neuron as stated in [8].

The spike firing behaviour of the spiking neuron model used in this paper is described by the simple LIF neuron. In addition, synapse model uses double decaying kernel function where there is one-to-one mapping between pre-synaptic and post-synaptic neurons. The spike time yields from the time-to-first-spike [11] encoding scheme, is used as the spike times for input neurons. From the last couple of decays while SNN becomes popular, researchers put more attention towards the development of an efficient supervised learning algorithm.

In [11], Bohte et al. proposed an algorithm based on gradient approach called SpikeProp quite similar to the back-propagation algorithm. It uses hidden layers to separate non-linear classification problems followed by multilayer feed forward network topology. The limitations of SpikeProp include incapability of utilizing a mixture of inhibitory and excitatory neurons providing better convergence rate, which is a barrier mimicking the biological neuron. In addition, it stagnates at the local minimum and posses slow convergence rate. SpikeProp uses time-to-first-spike [11] encoding method and SRM [12, 13] neuron model. Tempotron proposed in [18], makes spiking neuron learn based on spike-time decisions. However, rather than precise training of the output neuron, it acts as a decider (neuron fires or not). In this algorithm, there is a lack of good balance between biological plausibility and computational cost. SWAT proposed in [19], trained spiking neurons to classify non-linear patterns precisely to their respective classes. The interesting fact about SWAT is that it uses dynamic synapse model [20] which work in terms of long-term plasticity. Although SWAT posses a good dimension towards the biological plausibility, it lacks behind in terms of computational cost. There are huge number of network parameters to be adjusted in SWAT which makes the algorithm incapable to work with a moderate computational power. In [21], a remote supervision based ReSuMe algorithm is proposed and well investigated for supervised learning which uses the Hebbian learning. Many other algorithms such as SPAN [22], Chronotron [23], a temporal coding based supervised learning [24], SRESN [25], and OSNN [26] is proposed to train SNN in an efficient manner. However, from the literature review we can infer that the main advantage of SNN which makes it more special than that of sigmoidal neural network, is the ability to work with a single spiking neuron and in case of non-linear data efficiently is not highlighted. In addition, there is an improper trade-off between biological plausibility and computational cost in all the aforementioned algorithms. Note that, in SEFRON [27] algorithm, the use of a single spiking neuron is explored. But, the number of encoding neurons and synapses can be reduced further to half without hampering the accuracy of the system. In this paper, a single LIF neuron and less network parameters than that of SEFRON is used and experimentally proven better with computational cost, classification accuracy, and stability.

The popularity of a classification algorithm lies basically on the learning principle, but the development of an efficient learning algorithm is a challenging task. Due to the advantages of metaheuristic approach such as simplicity, flexibility, avoiding local optimum, and derivative free mechanism, a nature inspired leadership based meta-heuristic called Grey Wolf Optimizer (GWO) [28] is used to optimise the randomly initialised synaptic weights in this research. It is observed from the literature that various supervised learning algorithms do not have a well balanced trade off between computational cost and biological plausibility. Moreover, the use of too much network parameters did not improve the accuracy drastically. Note that, spiking neurons can be properly used without any hidden layers as well as hidden neurons for solving non-linear classification problems, but it is less explored due to the lack of proper mechanism to optimise synaptic weights resulting less error. Therefore, we focus on the use of single LIF neuron and its learning rule by introducing GWO tuned error function obtained from LIF neuron i.e., WOLIF classifier, which outperforms state-of-the-art methods.

Major contributions of this research paper are:

  1. 1.

    Proposed WOLIF classifier which uses the GWO algorithm to finely tune the error function derived from the output of LIF neuron.

  2. 2.

    The use of less network parameters achieved by removing hidden layer(s) and by working with less number of encoding neurons.

  3. 3.

    Utilization of static long-term synaptic weights (combination of both inhibitory and excitatory) to add a dimension towards the biological plausibility.

  4. 4.

    The optimised use of total simulation time to improve computational cost.

This paper is divided into sections followed by various subsection. The mapping of real valued features into temporal spikes, structure of the classifier, and spike firing behaviour of the neuron are discussed in Section 2. The optimisation mechanism of the synaptic weights through which the neuron learns from data followed by the description of the benchmarked datasets are discussed in Sections 3 and 4 respectively. Experimental results and interpretation of results are presented in Section 5. Finally, the conclusion drawn from this research is briefly summarized in Section 6.

2 Organisation and architecture of WOLIF

There are few things to focus on while utilizing spiking neurons in a classifier such as mapping real valued features into temporal spikes, selection of the synapse model, selection of the neuron model, and techniques to adjust the connection strength between neurons to an optimum level so that the neuron can learn from data properly. In this section, we discuss the mapping procedure of real valued features into temporal spikes followed by the architecture of the proposed model and spike firing capability of the neuron.

2.1 Mapping real features into temporal spikes

The real valued features xf where f ∈ [1,F] (F is the total number of real valued features) are converted into a set of pre-synaptic spike times using population encoding scheme [11] with η number of encoding neurons. Each neuron is allowed to fire a spike only once. Thus the values of xf are converted into spike times tm ∈ [0,Tref] (where, m = F × η, and Tref is the maximum value of encoding time i.e., set to 1 ms). Note that, Tref − 0 is the encoding time interval denoted by ΔT and the value of ΔT is also 1 ms. Each receptive field neuron q (q ∈ [1,η]) has a firing strength \(\mathcal {G}_{f}^{q}\), for the input data xf which is computed using (1).

$$ \mathcal{G}_{f}^{q} = exp \Bigg(-\frac{(x_{f} - \mu_{q})^{2}}{2\sigma^{2}}\Bigg) $$
(1)

where, μq is the mean of the individual q Gaussian functions calculated from (2), and σ is the standard deviation represented by the (3).

$$ \mu_{q} = \Bigg(\frac{2q-3}{2}\Bigg) \times \Bigg(\frac{1}{\eta-2}\Bigg) $$
(2)
$$ \sigma = \frac{1}{\upbeta} \times \Bigg(\frac{1}{\eta-2}\Bigg) $$
(3)

where, β is the adjustment factor which controls the overlapping of Gaussian curves. Finally, xf is converted to a set of pre-synaptic spike times tm using (4) and the response values computed from the (1).

$$ t_{m} = T_{ref} \times \Bigg(1-\mathcal{G}_{f}^{q}\Bigg) $$
(4)

Figure 1a shows the generation of response values from Gaussian receptive fields for the iris dataset which is used for benchmarking. In Fig. 1b, each pre-synaptic neurons with its corresponding spike times are shown for one temporal pattern drawn from the encoded iris spike times.

Fig. 1
figure 1

a Illustration of yielding Gaussian response values from three encoding neurons represented by Curve 1, Curve 2, and Curve 3 upon supplying real valued features. Here, for a real valued feature 1.4, three response values 0.04 (Curve 1), 0.61 (Curve 2), and 0.88 (Curve 3) are obtained. These response values are converted into spike times of 1.0 ms, 0.4 ms, and 0.1 ms within interval [0, 1] using (4) and rounded to the next time step i.e., 0.1 ms. b Raster plot of pre-synaptic neurons for iris dataset with their respective spike times

2.2 Encoding of temporal output spikes

Non-linear data is not easily separable and difficult to segregate into their respective classes in case of a pattern classification problem. Therefore, proper labelling of output spike times is very important and it is to be labelled in a manner so that each class is well separated from other classes in order to easily discriminate among different classes. Output spike times are generally selected trial and error basis in case of SNN within the range of total simulation time. We have selected output spike times as 1 ms for Class 1 and 2 ms for Class 2 in case of binary classification. In case of multi class (3-Class) classification such as iris dataset, we have selected output spike times as 1 ms for Class 1, 2 ms for Class 2, and 3 ms for Class 3. Also, for a 4-class classification problem such as Wireless indoor localization dataset, output spike times 1 ms represents Class 1, 2 ms represents Class 2, 3 ms represents Class 3, and 4 ms represents Class 4. Since time step is selected as 0.1 ms, each target class is well separated in all cases.

2.3 Architecture of the WOLIF

Figure 2 shows the architecture of the WOLIF classifier consisting of one input layer and one output layer. There are m = F × η input neurons in the input layer and only one output neuron in the output layer. The output neuron is the LIF neuron.

Fig. 2
figure 2

Architecture of the WOLIF classifier. Pre-synaptic spikes t0, t1, t2,...,tm (t0 is the spike time of bias neuron i.e., 0 ms) have an interval ti ∈ [0,Tref] ms ∀i and these spike times transmits through double decaying synapse model after being multiplied with synaptic weights w0, w1, w2,...,wm towards the only LIF neuron. At \(\hat {t}\) ms, membrane potential crosses threshold i.e., 1 mV. Now, according to the spike time 1 ms, 2 ms, 3 ms, and 4 ms \(\hat {t}\) tells the class for a particular temporal pattern

Although, WOLIF classifier is designed for classifying non-linear temporal pattern, there is no need of hidden layer(s) as well as hidden neurons as the only output LIF neuron is implemented and tuned very efficiently using GWO algorithm. Figure 3 shows the synapse model between pre-synaptic neuron 1 and only output neuron j, and pre-synaptic neuron 2 and only output neuron j. The synapse model selected is based on the principle of double decaying kernel function and it is described by (7). In addition, both excitatory and inhibitory synapses are used to contribute their respective excitation or inhibition into the sub threshold regime of membrane potential. Excitatory synapse produces Excitatory Post-Synaptic Potential (EPSP) for the membrane potential and inhibitory synapse produces Inhibitory Post-Synaptic Potential (IPSP) for the membrane potential. EPSP means an increase in the potential and IPSP means a decrease in the potential. In Fig. 3a it is observed that discrete spike time t1 is converted to continuous curve (i.e., the input stimuli) using double decaying kernel function given in (7). EPSP is formed after multiplying positive weight W1 with input stimuli ξ(tt1) i.e., the amplitude of the PSP raises. On the other hand, in Fig. 3b IPSP is formed after multiplying negative weight W2 with input stimuli ξ(tt2) i.e., the amplitude of the PSP declines. At \(\hat {t}\) ms, the membrane potential reaches threshold 1 mV and at \((\hat {t}+\delta t)\) ms comes to the rest. Since we consider the first spike only, the PSP value remains zero after firing the first spike. The output spike time \(\hat {t}\) ms decides the class of a particular temporal pattern.

Fig. 3
figure 3

a Excitatory synapse model. b Inhibitory synapse model

2.4 Spike firing behaviour of LIF neuron

The mechanism of firing a single spike is described in this section. The activity of the sub-threshold regime of the single output LIF spiking neuron, upon receiving weighted inputs through the double decay synapse model from pre-synaptic neurons, is characterised by the (5).

$$ \phi(t) = \phi(t-\delta t) + \sum\limits_{i=1}^{m} \mathbf{W_{i}} \times \psi(t) $$
(5)

where, m is the number of synapses connected to all pre-synaptic neurons, ϕ(t) determines the change in internal state of the only output neuron at time t, δt is the time step, ϕ(tδt) determines the change in internal state of the output neuron at time tδt (t = δt ms, indicates the neuron is at rest), W ∈IR is the synaptic weight vector, and ψ(t) is the input stimuli received by the output neuron from all pre-synaptic neurons m. The definition of the synapse model which produces the values for ψ(t) is given in (6).

$$ \psi(t) = \sum\limits_{i=1}^{m} \sum\limits_{t=\delta T}^{T+\delta T} \xi(t-t_{i}) $$
(6)

The definition of the double decaying kernel function is given in (7).

$$ \xi(t-t_{i}) = \left\{\begin{array}{ll} exp \left( \frac{t_{i}-t}{\tau_{m}} \right) - exp \left( \frac{t_{i}-t}{\tau_{s}} \right), & \text{if } t>t_{i} \\ 0, & \text{if } t \leq t_{i} \end{array}\right. $$
(7)

where, τm is the time constant of the cell membrane, τs (0 < τs < τm) is another time constant called synaptic time constant. The value of τm and τs controls the rise and decay of the PSP respectively. The spike times for all the patterns presented to the output neuron is described by the set \({\Gamma }_{i_{p}}\) given in (8).

$$ {\Gamma}_{i_{p}} = \left \{ t_{i}: 1 \leq i \leq m \right \} = \left\{ t : \phi (t) \geq {\Theta}\right \} $$
(8)

where, p denotes the total number of input patterns presented to the network for training, and Θ is the threshold value.

3 Learning rule for WOLIF classifier

In this section, the tuning principle of the synaptic weights using GWO algorithm [28] is discussed. The fine tuning of network parameters in an optimised manner is referred to as the learning of the WOLIF. The primary objective of learning is to optimise the randomly initialised weights in an efficient manner, so that every neuron contribute into the membrane potential in the sub-threshold regime it can be optimised resulting less error. According to the synapse model shown in Fig. 3, it is observed that the synaptic weights form a vector W rather than a matrix due to the use of only one output neuron. The weight vector, W ∈ [-0.25, 1], guarantees that the synapse structure is a combination of both excitatory and inhibitory synapse. The main purpose of tuning vector W with the help of GWO algorithm [28] is to minimise the difference between desired spike times \(\hat {t}_{d}^{y}\) and actual spike times \(\hat {t}_{a}^{y}\) (where, y varies from 1 to X is the total number of samples presented to WOLIF). In GWO algorithm, the leadership is distributed among four wolves namely α, β, δ, and ω. We did not use ω wolf in this research for searching the optimum solution in the search space. The algorithm provides both exploitation (attacking prey), and exploration (searching for prey) in a very well manner which ensures that the algorithm is less likely to fall on pre-mature convergence as well as do not trap at the local optimum. The effective utilisation of GWO algorithm along with LIF neuron is described in Algorithm 1 and calculation of fitness function for the search agents are described in the Algorithm 2. In this research, a total of 30 search agents denoted by S are used to find the optimum solution in the search space by effective searching. It is observed that increasing the number of search agents did not really improve the classification accuracy, rather it increases the network load, therefore we did not vary the number of search agents.

Initially, αs (score of α wolf), βs (score of β wolf), and δs (score of δ wolf) is set to \(\infty \). Then according to the value of \({f_{l}^{E}}\) (E is the total number of epoch), the value of αs, βs, δs as well as αp (position of α wolf), βp (position of β wolf), and δp (position of δ wolf) are updated as shown in Algorithm 1. The co-efficient of searching A depends on the value of a which controls the convergence or divergence property. The value of a linearly decreases from 2 to 0 and it emphasises the exploration and exploitation. The random searching co-efficient A is given by the (9).

$$ A = 2 \times a \times R - a $$
(9)

Another random searching co-efficient C is characterised by the (10).

$$ C = 2 \times R $$
(10)

where, R is random number ∈ [0, 1]. When |A| < 1, the algorithm converges towards the optimum solution and when |A| > 1, the algorithm diverges from optimum solution. The positions i.e., synaptic weights in our case are updated according to the (11), (12), and (13).

$$ \mathbf{W_{1}} = \alpha_{p} - A_{1} \times D_{\alpha} $$
(11)

where, A1 is the random searching co-efficient calculated using (9) and Dα = |C1 × αpW| (C1 is another random searching co-efficient calculated using (10)).

$$ \mathbf{W_{2}} = {\upbeta}_{p} - A_{2} \times D_{\upbeta} $$
(12)

where, A2 is a different random searching co-efficient than A1, calculated using (9) and Dβ = |C2 ×βpW| (C2 is another different random searching co-efficient than C1, calculated using (10)).

$$ \mathbf{W_{3}} = \delta_{p} - A_{3} \times D_{\delta} $$
(13)

where, A3 is another different random searching co-efficient than A1, and A2, also calculated using (9) and Dδ = |C3 × δpW| (C3 is also another different random searching co-efficient than C1, and C2, that is also calculated using (10)). Finally, weights for the next iteration Wl+ 1 (l varies from 1 to E) is updated using the (14).

$$ \mathbf{W_{l+1}} = \frac{\mathbf{W_{1}+W_{2}+W_{3}}}{3} $$
(14)
figure c

At the convergence epoch, WOLIF finds the optimum values for the synaptic weights \(\mathbf {W}_{E^{\prime }}\) (Where \(E^{\prime }\) is the convergence epoch) by following the objective function as given by (15).

$$ O(\alpha_{s}) = \frac{1}{1+\alpha_{s}} $$
(15)

where, αs depends on loss function \({\mathscr{L}}(\hat {t}_{d}^{y}, \hat {t}_{a}^{y})\) which is treated as the fitness function in this research, defined by the (16). The loss is calculated in terms of the Mean Squared Error (MSE).

$$ \mathcal{L}(\hat{t}_{d}^{y}, \hat{t}_{a}^{y}) = \frac{1}{X} \sum\limits_{y=1}^{X} (\hat{t}_{a}^{y} - \hat{t}_{d}^{y})^{2} $$
(16)

In order to maximise the objective function O(αs), the value of αs has to be minimised which indicates indirectly to minimise the loss function \({\mathscr{L}}(\hat {t}_{d}^{y}, \hat {t}_{a}^{y})\). After training is over, the trained or optimised weights \(\mathbf {W}_{E^{\prime }}\) are used to test the performance of the WOLIF classifier on a new set of testing samples.

figure d

4 Benchmarking datasets

We have used four binary datasets (Breast cancer, Ionosphere, Liver disorders, and Pima diabetes) and one multi class dataset (Iris flower) to benchmark WOLIF classifier.

4.1 Breast cancer

Breast cancer (WBC) dataset obtained from the University of Wisconsin Hospital [29] which utilises breast cytology gained with the help of fine needle aspirations. The WBC dataset consists of 699 samples out of which 16 missing values and these missing values are removed to get a total of 683 samples in this research. There are 444 samples belong to Benign class and 239 samples belong to Malignant class [30]. The 9 real valued features of WBC are converted into 9×3 + 1 = 28 pre-synaptic input spike times (3 encoding neurons and 1 bias neuron). Hence, 28:1 is the network topology for WBC dataset where there are 28 input neurons and 1 output neuron. Output spike times are 1 ms for Benign class and 2 ms for Malignant class.

4.2 Ionosphere

Ionosphere dataset is a collection of radar data collected through antenna to classify the condition of the ionosphere whether it is in Good or Bad condition. There are a total of 351 samples, each having 33 attributes representing features [31, 32]. The network topology for ionosphere is selected as 33×3 + 1 = 100 input neurons and 1 output neuron. A spike time of 1 ms represents the condition of the ionosphere as Good and 2 ms represents the condition of the ionosphere as Bad.

4.3 Liver disorders

The liver disorder dataset consists a total of 345 samples, each sample having 6 attributes describing the features of the samples [32, 33]. This dataset classifies the condition of liver into two classes namely Healthy and Unhealthy. There are 6×3 + 1 = 19 (3 encoding neurons, and 1 bias neuron) input neurons and 1 output neuron in the network topology. The output spike time 1 ms specifies a class to the Healthy category and 2 ms specifies a class to the Unhealthy category.

4.4 Pima diabetes

Pima Indian diabetes dataset is a collection of attributes to predict whether a person is suffering from diabetes or not. All patients are female of Pima Indian heritage. There are 768 samples each having 8 real valued features [32]. A total of 8× 3 + 1 = 25 pre-synaptic spike times where there are 3 encoding neurons and 1 bias neuron. The output spike time 1 ms classifies a patient to the Diabetic category, and 2 ms classifies to the Non-diabetic category. Network topology follows 25:1 structure, where there are 25 input neurons and 1 output neuron. A brief summary of the datasets containing features, classes, the number of training, and testing samples are presented in the Table 1.

Table 1 A brief summary of all datasets used for benchmarking

4.5 Banknote authentication

Banknote authentication dataset is a non-linear binary classification problem consists of a total number of 1372 samples [32]. For this dataset, 80% of total samples were used for training and 20% of the total samples were used for testing. The desired task is to classify the bank notes whether the notes are authentic or not. There are 4 real valued features based on which prediction is to be done. These real valued features are mapped into 4×3 + 1 = 13 pre-synaptic input spike times (3 encoded neurons, and 1 bias neuron). Output spike time 1 ms represents the Authentic class and 2 ms represents the Non-authentic class. Network topology for the dataset follows 13:1 structure, where there are 13 input neurons and 1 output neuron.

4.6 Iris flower

Iris dataset is a three class non-linear problem having 150 samples out of which every class consists of 50 samples each [32, 34]. Three classes represent the species of Iris plant namely Setosa, Versicolor, and Virginica. There are four real values features and those are mapped into 4×3 + 1 = 13 pre-synaptic input spike times (3 encoded neurons, and 1 bias neuron). Output spike time 1 ms represents Setosa species, 2 ms represents Versicolor species, and 3 ms represents Virginicia species. The network topology for Iris dataset is 13:1 (13 input neurons, and 1 output neuron).

4.7 Wireless indoor localization

Wireless indoor localization is a multiclass non-linear classification problem where there are 4 classes having 2000 total number of samples [35, 36]. In this case, 80% of total samples were used for training and 20% of the total samples were used for testing. There are 7 real valued attributes representing features and these are mapped into 7×3 + 1 = 22 pre-synaptic input spike times (3 encoded neurons, and 1 bias neuron). The output spike time 1 ms represents the location of First-room, 2 ms Second-room, 3 ms Third-room, and 4 ms Fourth-room. The network topology for the dataset is 22:1 (22 input neurons, and 1 output neuron).

5 Results and Discussion

To check the performance of the WOLIF classifier, it is experimented with four binary classification problems as well as one multi class classification problem.

The experimental results are compared with the state-of-the-art algorithms and found better in terms of Classification accuracy, and overall network parameters i.e., Synaptic Load (SL). Table 2 shows the network Topology, SL, Classification accuracy (Training), Classification accuracy (Testing) as well as comparison with the state-of-the-art algorithms.

Table 2 Overall performance comparison for binary datasets

In addition, Table 3 shows the comparison between SEFRON and WOLIF in case of binary classification and SpikeProp and WOLIF in case of Iris classification problem in terms of computational cost (Ccost) defined as in (17).

$$ C_{cost} = \frac{SL \times E \times T}{\delta t} $$
(17)

In (17), a lower value of Ccost indicates a computationally efficient model. All experiments were carried out using Python 3 in a 64-bit Windows 10 operating system installed in a desktop-PC having Intel Xeon processor configured with 8 GB RAM, and 3.0 GHz clock speed. For each dataset, 10 random training trial set were generated. Table 2 shows the average training accuracy along with the standard deviation. The selection of parameter values is one of the most crucial and important step in order to work efficiently with SNN. In this section, the role and effectiveness of major network parameters are also discussed along with experimental results.

Table 3 Comparison of computational cost of binary datasets with SEFRON

Table 4 shows a brief summary of parameter values used by WOLIF where η is the number of encoding neurons, lb, and ub represents the lower and upper bound of random weight initialisation respectively.

Table 4 Parameter values used by WOLIF for different types of datasets

5.1 Classification accuracy and C cost

From Table 2, it is observed that in case of Breast cancer dataset, WOLIF achieves training accuracy of 97.8% and testing accuracy of 97.0% while SEFRON gives 98.3% training accuracy, SpikeProp gives 97.2% testing accuracy those are little bit higher than that of WOLIF. However, WOLIF uses very less synaptic load i.e., 28 almost half compared to SEFRON and very very less compared to other algorithms mentioned in the Table 2. In addition, Table 3 shows the better Ccost of 2.8 × 105 for WOLIF than that of SEFRON which is 2.2 × 106 in case of Breast cancer dataset. In case of Ionosphere dataset, training accuracy 94.4% is achieved by WOLIF that can be considered better although SEFRON gets 97.0% since SEFRON performs poor in testing accuracy and also synaptic load for WOLIF is 100 that is almost half compared to SEFRON. Moreover, for Ionosphere dataset, WOLIF has a Ccost of 1 × 106 and SEFRON has 7.9 × 106. From Table 2, we observe that Liver disorders dataset and Pima diabetes dataset is not easily separable from their respective classes. In these two dataset, WOLIF outperforms all the other algorithms in terms of testing accuracy and synaptic load. WOLIF gives 80.3% and 83.3% testing accuracy for Liver disorders dataset and Pima diabetes dataset respectively. Although SEFRON attains little bit higher training accuracy for the two aforementioned dataset, it shows poor performance in testing. Note that, both of these dataset has a better Ccost value with WOLIF as shown in Table 3. Iris dataset is a multi class classification problem where WOLIF shows 94.1% training accuracy and 95.1% testing accuracy those are very comparable with other algorithms shown in the Table 5 when synaptic load is also considered as it is one of the most important factor in case of SNN. WOLIF has a synaptic load of almost 8 times lesser than SRESN and very very less when compared with state-of-the-art algorithms.

Table 5 Overall performance comparison for multi class dataset

In addition, SpikeProp has Ccost value of 6.625 × 109 while WOLIF has 1.95 × 105 as shown in Table 6.

Table 6 Comparison of computational cost of multi class dataset with SpikeProp

In Table 7, the overall performance of WOLIF in case of all the benchmarked datasets are presented to analyse the performance of WOLIF in different classification problems having a wide variety of number of samples. It is observed from the Table 7 that the in case of 4 class classification problem (Wireless indoor localization) also, WOLIF shows a good training accuracy of 84.6% and a good testing accuracy of 84.8% if we consider the number of epochs. In addition, for Banknote authentication dataset, WOLIF shows very satisfactory training accuracy and testing accuracy those are 95.5% and 93.2% respectively.

Table 7 Overall performance of WOLIF in case of all the seven datasets

Figure 4a shows the training accuracy curve for Breast cancer dataset and Fig. 4b shows the training accuracy curve for Ionosphere dataset along with all 10 set of random trials. Figure 5a and b show the training accuracy curve for Liver disorders dataset and Pima diabetes dataset respectively along with all 10 set of random trials. In Fig. 6a, training accuracy curve for Iris flower dataset is presented along with 10 set of random trials.

Fig. 4
figure 4

a Training curve of Breast cancer dataset with all 10 set of random trials. b Training curve of Ionosphere dataset with all 10 set of random trials

Fig. 5
figure 5

a Training curve of Liver disorders dataset with all 10 set of random trials. b Training curve of Pima diabetes dataset with all 10 set of random trials

Fig. 6
figure 6

a Training curve of Iris flower dataset with all 10 set of random trials. b Illustration of the classifying behaviour of WOLIF in case of Iris dataset after training for three samples of different classes

Figure 6b shows the behaviour of PSPs in case of three different class samples drawn from Iris dataset after training. The first PSP reaches the threshold at 1.2 ms that was supposed to be 1 ms in case of the best case scenario that means PSP reaches the threshold 0.2 ms late. Likewise other two PSPs reaches the threshold at 2.2 ms (desired is 2 ms), and 2.9 ms (desired is 3 ms) respectively. In this case, MSE for class Setosa is 4%, MSE for class Versicolor is 4%, and MSE for Virginica is 1%. Overall loss in terms of MSE is 3% i.e., training accuracy of 97% where each class is having only one sample. Figure 7a and b shows the behaviour of WOLIF while training in case of Banknote authentication and Wireless indoor localization datasets respectively. All 10 random trials are clearly shown in Fig. 7a and b respectively.

Fig. 7
figure 7

a Training curve of Banknote authentication dataset with all 10 set of random trials. b Training curve of Wireless indoor localization dataset with all 10 set of random trials

5.2 Effect of τ m and τ s

The major focus in SNN is the efficient updating process of the membrane potential so that PSP can reach the threshold value not so early as well as not so late. Time constants such as τm and τs plays a very crucial role along with synaptic weights W in the synapse model those contributes information to the sub-threshold regime. τm controls the rising of PSP curve towards threshold and τs controls decaying width of the PSP curve which is necessary for overlapping in case of multiple PSPs. Therefore, good selection of the values for τm and τs improves the spike firing capability of the neuron. A high value of τm, forces the neuron to fire a spike too early and a low value of τm does not allow a PSP to raise easily towards threshold. Since weights get multiplied with τm and τs indirectly, it becomes more important to make a balance between the selection of values for τm and τs. In addition, a higher value of τs produces wide shaped PSPs and therefore less overlapping happens among multiple PSPs. It is recommended to select the value of τm slightly greater than the encoding interval ΔT i.e., 1 ms in our case. The value of τm is set to 1.1 ms that is just one time step ahead (since we have used time step δt as 0.1 ms). Note that, the value of τs is set to half of τm and found better experimentally. Figure 8a and b shows the shape of PSPs upon varying τm while keeping τs half of τm and the shape of PSPs upon varying τs while keeping τm 1.1 ms respectively. From Fig. 8a and b, the effect of τm and τs on the rise and decay of PSP is clearly interpretable. Moreover, Fig. 9a shows the shape of PSP by varying both τm and τs where the need of overlapping is clearly presented.

Fig. 8
figure 8

a The value of τm varies keeping τs same i.e., half of the first value of τm. b The value of τs varies keeping τm same i.e., 1.1 ms

Fig. 9
figure 9

a The value of τm varies by 1 ms each and accordingly value of τs varies. In all cases the value of τs is exactly half of the value of τm. b Unweighed excitatory and inhibitory synapse

5.3 Effect of weights initialisation range

SNN is very sensitive to synaptic weights which is a very important parameter that affects the spike firing behaviour of a neuron directly. Therefore, initialisation of weights to some random value has to be done very carefully. We applied heuristic rule for the selection of weights initialisation range. The rule is to set the upper limit less or equal to threshold and lower limit to some small negative values. We have selected the range as [-0.25, 1] where the negative lower limit allows 20% negative weights from [-0.25, 0) which corresponds to the inhibitory synapse and the positive upper limit allows 80% positive weights from [0, 1] which corresponds to the excitatory synapse. Figure 9b shows the shape of an excitatory PSP and an inhibitory PSP where shapes clearly explain their role towards the updating process of PSPs. Although many researchers claim that a mixture of inhibitory and excitatory synapse does not allow a classifier to converge easily, we have used the same in an efficient way with better convergence rate.

5.4 Role of bias neuron

The bias neuron initially starts the membrane potential updating process if there are more number of lateral pre-synaptic spike times. Therefore, the spike time for the bias neuron is set to a very early pre-synaptic spike time i.e., 0 ms.

5.5 Role of time step

A small value of time step δt takes more iteration over total simulation time T thereby increases computational cost. We have used T as 2 ms for binary classification problem, 3 ms for 3-class classification problem, and 4 ms for 4-class classification problem. The value of δt is set to 0.1 ms, therefore it takes at most 20 iterations to produce a spike for binary classification problem, at most 30 iterations to produce a spike in case of 3-class classification problem, and at most 40 iterations to produce a spike in case of 4-class classification problem. However, a large value of δt over T does not allow SNN to learn form non-linear data properly and thus affects the training of the classifier. In SEFRON, the value of T was taken as 4 ms and δt was taken as 0.01 ms i.e., computationally costlier than that of WOLIF.

5.6 Effect of encoding neurons

The selection of the number of encoding neuron η directly affects the computational cost. A higher value of η means higher is the number of input neurons since population encoding is used in this research. Therefore, it has to be selected in a very careful manner. We set the value of η as 3 therefore we successfully minimise the total network load in terms of synaptic connections to an optimum level and it is clearly visible in the Tables 2 and 5.

5.7 Stability and generalisation

From Tables 25, and 7 when accuracies are analysed it is observed that WOLIF is more stable in case of random set of trial since the standard deviation does not differ very much from the mean value of accuracies. Moreover, the capability of handling versatile dataset with the minimal synaptic load and without hidden layer in case of non-linear temporal patterns, inclines WOLIF towards the property of generalisation.

6 Conclusion

In this paper, an efficient classifier WOLIF along with its learning rule in order to classify non-linear temporal patterns has been presented. WOLIF shows very impressive training and testing accuracy as well as computational cost which uses GWO algorithm for weights optimisation and LIF neuron having double decaying synapse model for the generation of temporal spikes. It is both biologically plausible and computationally efficient. The usage of static long-term synaptic weights that is a combination of both inhibitory and excitatory synapses justifies the biological plausibility. WOLIF outperforms state-of-the-art algorithms in case of binary classification and almost equally performed in case of multi class classification problem. The total simulation time is also reduced to improve computational cost compared to the state-of-the-art algorithms. In addition, the stability and generalisation of WOLIF classifier is also mentionable.

In future work, WOLIF can be improved further by allowing to fire multiple spikes from the same neuron.