1 Introduction

Dynamic trafficking of proteins from cell nucleus to cytoplasm mostly depends on transport factors in the Karyopherin-\(\beta\) family, which are named importins and exportins [13]. The direction of nuclear–cytoplasmic transport is mainly mediated by the targeting cell signal within cargo proteins, that is, nuclear localization signal and nuclear export signal [4, 5]. Nuclear export signals (NESs) are nuclear targeting signals within the cargo proteins, which is composed of four main hydrophobic residues that targets it for export from cell nucleus to cytoplasm through nuclear pore complex [6]. Since NESs were first identified in proteins HIV-1 Rev and cyclical AMP-dependent protein kinase inhibitor, many other NESs have been experimentally identified in more than 200 proteins, such as translation factors [7], cell cycle regulators [8], transcription factors [9] and viral proteins [10]. A well-known NES is the leucine-rich NES, which mediates binding to the receptor of the karyopherin exportin 1/chromosomal region maintenance 1 (CRM1), which has important application in replication of plenty of viruses that might cause human diseases [1113].

Finding NES in cellular proteins is a challenging but important problem, and many efforts in cellular biology have been made to detect NESs. Experimental identification of NES-containing proteins has been an effective method to find NESs, but it is likely to take a lot of resources, time and effort to perform experiments on false NES from large-scale candidate protein sequences. In experiments for identifying NES-containing proteins, it is necessary to do cell culture and treatments, plasmid construction and some other tedious steps. It needs about 3 days in laboratory to determine whether an amino acid sequence is a NES or not. If a large number of NESs are planned to be detected experimentally, the cost of the experimental resources becomes huge, which is also very time-consuming [14, 15]. A possible way to solve this problem is developing computational approaches to predicting or identifying NES from protein sequences, which paves the way for a hot and promising research branch in Bioinformatics [1618]. Among computational approaches, a general way is to construct prediction or identification models from particular biological features of NESs, including secondary structure [19], inner disorder [6], \(\alpha\)-helical structure [20], \(\alpha\)-helix-loop or all-loop structures binding in a hydrophobic groove on the convex surface [21, 22] and regular expressions [23]. In recent research, it is found that intelligent computing methods, such as machine learning-based strategy [24] and position-specific scoring matrix (PSSM [25]), performed well in predicting NESs from high-throughput data of amino acid sequences, with prediction rate around 60 %.

Neural networks are well-known computational models inspired by the central nervous system of animals. Spiking neural P systems are spiking neural-like computing models, which are inspired from the way the neurons’ spiking and communicating by means of spikes [26]. These systems are named SN P systems, for shortly. As a new candidate of spiking neural network models [27], SN P systems perform well in doing computation, that is, the systems and almost all of their variants can achieve Turing completeness. Notably, it is proved that SN P systems can generate and accept the set of Turing computable natural numbers [26], generate recursively enumerable languages [28] and compute the set of Turing computable functions [29]. Inspired by different biological phenomena and mathematical motivations, lots of variants of SN P systems have been proposed, such as SN P systems with anti-spikes [30, 31], SN P systems working in asynchronous mode [32], asynchronous SN P systems with local synchronized sets of neurons [33], SN P systems with astrocyte-like control [34], SN P systems with request rules [35, 36], homogeneous SN P systems [37, 38], sequential SN P systems [39, 40] and SN P systems with rules on synapses [4143]. For applications, SN P systems are used to design logic gates, logic circuits [44] and operating systems [45], perform basic arithmetic operations [46], solve combinatorial optimization problems [47] and diagnose fault of electric power systems [4850]. SN P systems with neuron division, budding or separation can generate exponential space with space–time trade-off strategy, thus providing a way to theoretically solve computationally hard problems in a feasible (polynomial or linear) time [5154]. There are some significant simulators for SN P systems, see, for example, [5557].

In artificial neural networks, a sigmoid function is used to imitate biological neuron’s spiking, while in SN P systems, spiking rules are used, denoted in the form of production in grammar of formal languages, to describe the neuron’s spiking behavior. The applications of spiking rules are controlled by the number of spikes contained in the neuron at certain moment, which determine the triggering conditions. A neuron may contain multiple rules, in the sense of having ability to select spiking conditions, and can send out different numbers of spikes by consuming different numbers of spikes.

In this work, we propose a computational approach to identify NES using SN P systems. Specifically, 30 experimentally verified NES which have the unique biological feature secondary structure elements are randomly selected for training the SN P system. Subsequently, 1224 amino acid sequences, composed of 1015 regular amino acid sequences and 209 experimentally verified NES, are selected from 221 NES-containing protein sequences randomly in NESdb [13] to test our method. The experimental results show that our method achieves a precision rate of 74.18 %, which performs better than NES-REBS with a precision rate of 47.2 % [58], Wregex with a precision rate of 25.4 % [59], ELM with a precision rate of 33.5 % [23] and NetNES with a precision rate of 37.4 % [60]. The results of this study are promising in terms of the fact that it is the first feasible attempt to use spiking neural P systems in computational biology after many theoretical advancements.

2 Spiking neural P system with Hebbian learning strategy

It is useful for readers to have some familiarity with basic concepts and notions in SN P systems [26, 61].

The formal definition of SN P system of degree \(m\ge 1\) is a construct of the form from [26]:

$$\begin{aligned} \varPi =(O,\sigma _1,\sigma _2,\dots ,\sigma _m,syn,i_{in},i_{out}), \text{ where } \end{aligned}$$
  • \(O=\{a\}\) is a singleton alphabet and a is called spike;

  • \(\sigma _1,\sigma _2,\ldots ,\sigma _m\) are neurons of the form \(\sigma _i=(n_i,R_i)\), with \(1\le i\le m\), where \(n_i\) is initial number of spikes in neuron \(\sigma _i\) and \(R_i\) is the set of rules in neuron \(\sigma _i\):

    1. 1.

      spiking rule: \(E/a^c\rightarrow a^p\), where E is a regular expression over O, c the number of spikes to be consumed and \(c\ge p\ge 1\);

    2. 2.

      forgetting rule: \(a^s\rightarrow \lambda\), with the restriction that \(a^s\notin L(E)\) for any spiking rule;

  • \(syn\subseteq \{1,2,\dots ,m\}\times \{1,2,\dots ,m\}\) with \((i,i)\notin syn\) is the set of synapses between neurons;

  • \(i_{in}\) indicates the input neuron that reads spikes from the environment, and \(i_{out}\) indicates the output neuron that can also emit spikes into the environment.

A spiking rule is a rule of the form \(E/a^c\rightarrow a\). At certain moment, when neuron \(\sigma _i\) holds k spikes such that \(a^k\in L(E)\), \(k\ge c\), then spiking rule \(E/a^c\rightarrow a\) is applied. This means that c spikes are consumed (\(k-c\) spikes remaining in neuron \(\sigma _i\)) and neuron \(\sigma _i\) fires, producing one spike. This spike will be emitted from neuron \(\sigma _i\) to each of its neighboring neurons (a global clock is assumed, marking the time for the whole system; hence, the functioning of the neurons is synchronized). The applicability of a rule is controlled by the total number of spikes accumulated in the neuron. The work of each neuron is sequential: Only one rule can be applied in each time unit, but different neurons work in a parallel manner.

Forgetting rules are of the form \(a^s\rightarrow \lambda\) with \(s\ge 1\) and are applied only if the neuron contains exactly s spikes. By applying the forgetting rule, s spikes will be removed from the neuron and thus out of the system. We have the limitation that when a forgetting rule is used at a computation step, no spiking rule is enabled. It means that in any neuron, if a spiking rule is applicable, then no forgetting rule is enabled to use, and vice versa.

The study of incorporating Hebbian learning to SN P systems was initialed in [62], where a theoretical Hebbian learning SN P system model was developed. The systems were designed in a formal way to update their information, but no application was proposed. We construct here an SN P system with Hebbian learning strategy to identify nuclear export signals, where the systems are different from the ones in [62], but having the common biological facts.

The system will be given graphically by a directed graph, where rounded rectangles with the initial number of spikes and rules are used to represent neurons and edges represent the synapses. Input neurons have inputting synapses, by which they can read spikes from the environment; output neurons have outgoing synapses to emit spikes into the environment. The system consists of two modules, the input module and the predict module.

2.1 The input module

The input module is composed of an input neuron (reading spike trains from the environment), a trigger neuron (starting the module), 75 “transmitting” neurons labeled with \(A_1, A_2, \dots , A_{75}\) and 75 “gathering” neurons labeled with \(B_1, B_2,\) \(\dots , B_{75}\). The topological structure of the input module is shown in Fig. 1.

  • The input neuron has no initial spike inside and the unique spiking rule \(a\rightarrow a\). With the spiking rule at any moment, when the nput neuron has one spike inside, it fires with using spiking rule \(a\rightarrow a\), emitting one spike to its neighboring neurons (the ones having synapses pointing from the neuron Input). The function of the input neuron is to read spike trains (in the form of binary strings) from the environment one by one bit. The input neuron reads spike train as follows. Let \(w=w_1w_2\dots w_{75}\) is the spike train to be read with \(w_i\in \{0,1\}\), and at certain moment t the input neuron starts to read it. In each step, the input neuron reads one bit of spike train w. At any step \(t+p\), if \(w_p=1\), then the input neuron reads one spike from the environment; otherwise, the input neuron reads no spike.

  • The trigger neuron is used to start the computation of the module. In the input module, all the neurons initially have no spike inside, with the exception that the trigger neuron contains one spike. With the spike, the trigger neuron can fire at the first step of the computation, consuming the initially contained spike and emitting one spike to its neighboring neuron. Neuron \(\sigma _{A_1}\) is the unique neighboring neuron of the trigger neuron.

  • The 75 “transmitting” neurons labeled with \(A_1, A_2, \dots , A_{75}\) have spiking rule \(a\rightarrow a\). For any \(2\le i\le 74\), when neuron \(\sigma _{A_i}\) receives one spike from neuron \(\sigma _{A_{i-1}}\), it fires by using the spiking rule \(a\rightarrow a\), emitting one spike to neuron \(\sigma _{A_{i+1}}\). When neuron \(\sigma _{A_{75}}\) fires, it sends one spike to neuron \(\sigma _{A_1}\), and a new circle is started.

  • The 75 “gathering” neurons labeled by \(B_1, B_2, \dots , B_{75}\) have spiking rule \(a^2\rightarrow a\) and forgetting rule \(a\rightarrow \lambda\). This means that when neuron \(\sigma _{B_i}\) has two spikes, it fires by using the spiking rule \(a^2\rightarrow a\) emitting one spike to neuron \(\sigma _{C_i}\); when neuron \(\sigma _{B_i}\) has one spike, the spike is deleted by using forgetting rule \(a\rightarrow \lambda\). Each neuron \(\sigma _{B_i}\) has a synapses from the input neuron and “transmitting” neuron \(\sigma _{A_i}\), respectively. This means only when both of the input neuron and “transmitting” neuron \(\sigma _{A_i}\) fire, each of them sends one spike to neuron \(\sigma _{B_i}\). Neuron \(\sigma _{B_i}\) accumulates two spikes and fires by using spiking rule \(a^2\rightarrow a\) to send one spike to neuron \(\sigma _{C_i}\).

Fig. 1
figure 1

Input module

2.2 The predict module

The predict module consists of 75 “processing neurons”, which are labeled by \(C_1, C_2, \dots , C_{75}\), and four output neurons \(\sigma _{Output_1},\sigma _{Output_2},\sigma _{Output_3},\sigma _{Output_4}\). All the neurons in the predict module have a unique spiking rule \(aa^*/a\rightarrow a\), and the weights of all the synapses are initially set to be 1. The spiking rule \(aa^*/a\rightarrow a\) can be used when neuron \(\sigma _{C_i}\) has any number of spikes. For each transition step, one spike is consumed and one spike is emitted out. For example, if neuron \(\sigma _{C_i}\) accumulates k spikes inside, then it will fire for k times and emits in total k spikes out, one spike in each time spiking. The 75 “processing neurons” are separated into three layers: the inner layer, hidden layer and outermost layer. The topological structure of the predict module is like a “ripple” with three layers as shown in Fig. 2. The spikes transmit from the inner layer to the hidden layer and then to the outmost layer.

Fig. 2
figure 2

General “ripple”-like framework of the SN P system with three layers

The inner layer consists of 11 neurons, framed up in a red dashed line, which have connections to each neuron in the hidden layer, framed up a blue dashed line. The hidden layer has four subgroups, named top, bottom, leftward and rightward subgroups. The top and bottom subgroups have 3 neurons each, while the leftward and rightward subgroups have 11 neurons. The outermost layer, framed in green dashed lines, is composed of 36 neurons, which is divided into four subgroups as well. The top and bottom subgroups of the outermost layer have 5 neurons each, and the leftward and rightward subgroups have 13 neurons. In Fig. 3, it shows the involved neurons in each layer.

Fig. 3
figure 3

Predict module, where “\(\rightarrow\)” means each neuron from the former dashed frame has one synapse to every neuron in the latter dashed frame, and the neurons from the same dashed frame has no synapse among each other

Each neuron in the top (resp. bottom, leftward, rightward) subgroup of the hidden layer has a synapse connection to every neuron in the top (resp. bottom, leftward, rightward) subgroup of outermost layer. Four neurons are used to collect information from the neurons from the four (top, bottom, leftward and rightward) subgroups, respectively. The result of a computation is a 4-dimensional vector recording the number of spikes emitted into the environment from the 4 output neurons.

2.3 The Hebbian learning strategy

Each “processing” neuron \(\sigma _{C_i}\) has a synapse connection with “gathering” neuron \(\sigma _{B_i}\). Neuron \(\sigma _{C_i}\) can receive spikes from “gathering” neuron \(\sigma _{B_i}\), when the system reads spike trains through the input neuron one by one bit. With the spikes inside, neuron \(\sigma _{C_i}\) can fire and send spikes to its neighboring neurons. At any moment, when a neuron sends spikes along certain synapse, a Hebbian learning strategy is imposed on the synapse in the predict module, that is, the weight of the synapse will be increased by an augmenter \(\Delta w\) for each time passing some spikes. In general, at any moment, if neuron \(\sigma _{C_i}\) has fired t times and passed spikes along its synapse for t times, then the weight of the synapse is \(1+t\times \Delta w\). Note that, the weights on the synapses among neurons from the input module are fixed during the computation.

The weight on a synapse is a function which amplifies the spikes passing along it. Specifically, if at some moment the weight on certain synapse is w and k spikes pass along it, then in total \(w*k\) spikes are received by the target neuron. The received spikes will be accumulated in the neuron. Mathematically, we can use the following Eq. 1 to calculate the weight at certain moment \(t+1\) of the synapse connecting neuron \(\sigma _{C_i}\) and \(\sigma _{C_j}\).

$$\begin{aligned} w_{t+1}=\left\{ \begin{array}{ll} 1, &\quad \hbox {neuron }\sigma _{C_i} \hbox { remains inactive at step }t; \\ w_t+\Delta w, &\quad\hbox {neuron }\sigma _{C_i} \hbox { fires at step }t. \end{array} \right. \end{aligned}$$
(1)

The weights on synapses connecting each pair of neurons \(\sigma _{B_i}\) and \(\sigma _{C_i}\) are updated with the Hebbian learning strategy, while the weights on synapses from the input module are always 1.

In the computation, the topological structure the input module does not change during the computation, but the topological structure of the prediction module can be modified by the updating strategy of weights on synapses.

3 Identification of NES by the SN P system

In this section, NES identification using the SN P system with Hebbian learning is presented. It starts by explaining the way to encode secondary structure of NES into binary sequences, and then, the training strategy and prediction processes are elaborated.

3.1 Encoding secondary structure of NES into binary sequence

The information that the SN P system can read are encoded in form of spike trains, i.e., binary sequences. Before we use the SN P system to identify NESs, it is necessary to encode the secondary structure of NES into a binary sequence.

The secondary structure of a NES is usually a loop conformation or helix-loop conformations starting with an \(\alpha\)-helix. We use the secondary structure prediction tool PSIPRED to calculate the secondary structure of NESs, by which the secondary structure of a NES can be described by a string of letters. Each letter has a specific meaning of the structure, such as H (alpha helix), B (residues in isolated beta-bridge), G (3-helix), S (bend), I (5 helix), T (hydrogen bonded turn) and E (extend strand). There are in total 20 letters that are used to describe the secondary structure of NESs. Each letter is represented by a binary string of five bits in a disjoint manner. The binary strings for the letters describing the secondary structure of NESs are shown in Table 1.

Table 1 Binary strings for the letters describing the secondary structure of NESs

With the encoding method, any NES can be represented by a binary string. Specifically, a NES is a sequence of amino acid, whose secondary structure can be obtained by PSIPRED and represented by a string of letters. With the encoding strategy in Table 1, the string of letters of secondary structure can be transformed into a binary string. An example of encoding NES “90-L R S E E V H W L H V D M G V-104” into binary string is given in Table 2.

Table 2 An example of encoding NES into binary string

3.2 The general process of identifying NESs

In general, the process of identifying NESs (represented by binary strings/spike trains) using the SN P system has four stages: reading stage, training stage, generating standard output and identifying unknown NESs.

3.2.1 Reading stage

A set of NESs are randomly selected and encoded into binary strings with their secondary structure. The binary strings will be read by the input neuron one by one bit. Since each NESs is of length 15 (having 15 amino acids), the string encoding a secondary structure has 15 letters. With the strategy shown in Table 1, each binary string is of length 75. That is why in the input module 75 “transmitting” neurons and 75 “gathering” neurons are designed.

3.2.2 Training stage

The input neuron reads binary strings one by one bit. Suppose the input neuron starts to read a binary string at a certain moment t. When it reads one spike from the environment at any step \(t+p\) (\(1\le p \le 75\)), it fires and sends one spike to the “gathering” neuron \(\sigma _{B_p}\). Meanwhile, “transmitting” neuron \(\sigma _{A_p}\) sends one spike to neuron \(\sigma _{B_p}\). With two spikes inside, neuron \(\sigma _{B_p}\) fires by using spiking rule \(a^2\rightarrow a\), sending one spike to neuron \(\sigma _{C_p}\). Having any number of spikes inside, neuron \(\sigma _{C_p}\) fires by using spiking rule \(a^*/a\rightarrow a\), emitting one spike to each of its neighboring neurons. Also, the weights on the synapses the spikes passing along with will be increased by \(\Delta w\).

If the input neuron reads no spike from the environment (indicating the bit of the binary string is 0), then the “gathering” neuron \(\sigma _{B_p}\) has only one spike from “transmitting” neuron \(\sigma _{A_p}\). In this case, neuron \(\sigma _{B_p}\) cannot fire and sends no spike out. Neuron \(\sigma _{C_p}\) cannot receive any spike, and remains inactive. The weights on synapses starting from neuron \(\sigma _{C_p}\) remain unchanged.

When the system finishes reading one binary string, the input module returns to its initial configuration and is ready to read the next binary string. The system can read multiple binary strings one by one. By reading binary strings from environment, the “precessing” neuron \(\sigma _{C_i}\) may fire and the weights on the synapses among “processing” neurons are updated with the Hebbian learning strategy. The four output neurons emit spikes into the environment, but will be ignored. When the system finishes reading the set of binary strings of NESs, it forms a specific topological structure by processing the input information.

3.2.3 Generating standard output

For each NES used in training the SN P system, the binary string representing its secondary structure is input into the “trained system.” In total, 30 four-dimensional vectors can be obtained, which record the numbers of spikes emitted by output neurons \(\sigma _{output_1},\sigma _{output_2},\sigma _{output_3},\sigma _{output_4}\) by reading the 30 NESs. The average vector of the 30 output vectors, denoted by \((stan_1,stan_2,stan_3,stan_4)\), is called the standard outputting vector of the letter.

3.2.4 Identifying unknown NESs

The task of identifying NESs is to judge whether an amino acid sequence is a NES. For any amino acid sequence, its binary string of the secondary structure is obtained by PSIPRED and then introduced into the trained SN P system. When the system halts, a 4-dimensional vector \((out_1,out_2,out_3,out_4)\) is generated recording the numbers of spikes emitted by the four output neurons. We calculate the variance between the outputting vector of the amino acid sequence and standard outputting vector. The variance is calculated by

$$\begin{aligned} var=\sqrt{\sum ^4_{i=1}(out_i-stan_i)^{2}}, \end{aligned}$$

where \((out_1,out_2,out_3,out_4)\) is the outputting vector of the unknown letter and \((stan_1,stan_2,stan_3,stan_4)\) is the standard outputting vector of a certain letter. If the value of the variance is lower than a threshold, then the amino acid sequence is determined as a potential NES.

4 Experimental results

In the experiments, secondary structure elements of 30 experimentally verified NES are randomly selected for training the SN P system. The 30 selected NESs are shown in Table 3.

Table 3 Thirty randomly selected NESs for training the SN P system

In the process of training the SN P system, we set the unit increment \(\Delta w\) to be 0.1, that is, when a spike passes along a synapse from prediction module, its weight will be increased by \(\Delta w=0.1\). The threshold value is set to be 525, which is obtained by calculating the average variance of each pair of NESs used to train the system. For any amino acid sequence, if the variance between its output vector (calculated by the trained SN P system) and the standard outputting vector is less than 525, then the amino acid sequence is determined as NES; otherwise, it is determined as a regular or non-signal amino acid sequence. To test our method, we use the trained SN P system to identify 209 experimentally verified NESs from 1224 amino acid sequences, where 1015 regular amino acid sequences are randomly abstracted from 221 NES-containing protein sequences.

Experimental results show that the SN P system can identify correctly 114 of the 209 experimentally verified NESs and can also determine correctly 809 of the 1015 regular amino acid sequences. The distribution of the variance of the 209 experimentally verified NESs and 1015 regular amino acid sequences are shown in Figs. 4 and 5.

Fig. 4
figure 4

Distribution of the numbers of NESs and their variance

Fig. 5
figure 5

Distribution of the numbers of amino acid sequences and their variance

Hence, our method achieves the precision rate \(\frac{114+809}{209+1015}\approx 75.41\,\%\). While NES-REBS has a precision rate of 47.2 % [58], Wregex has a precision rate of 25.4 % [59], ELM has a precision rate of 33.5 % [23], and NetNES has a precision rate of 37.4 % [60].

For statistic analysis, the proposed method is used to identify 2530 randomly generated amino acid sequences. It identifies 1792 sequences as non-NESs, that is, it achieves a precision rate above 70 %, thus having statistical significance.

5 Conclusion

In this work, we address the challenge of identifying NESs from amino acid sequences using SN P system. An SN P system with Hebbian learning strategy is firstly constructed with input and predict modules. After that, secondary structure elements of 30 experimentally verified NES are randomly selected for training an SN P system. We use 1224 amino acid sequences to test our method, where 1015 regular amino acid sequences and 209 experimentally verified NESs are abstracted from 221 NES-containing protein sequences in NESdb randomly. Experimental results show that our method achieves a precision rate of 74.18 %, which performs better than NES-REBS, Wregex, ELM and NetNES.

In our method, the secondary structure elements of experimentally verified NESs is applied to train the SN P system. Actually, there are some other biochemical properties shown as follows from [24].

  • Secondary structure prediction of the regular expression match sequences.

  • Avg.predicted surface accessibility of the regular expression sequences.

  • Avg.predicted disorder score of the regular expression.

  • Hydrophobicity of the regular expression match sequences of negatively charged residues in the upstream flank.

  • Whether the first two residues are involved in a \(\beta\)-strand based on secondary structure.

  • Prediction of polar residues in the downstream flank.

  • Whether the first two residues are involved in a \(\beta\)-strand based on secondary structure prediction.

  • Distance to previous match of the regular expression divided by the protein length.

It is worth for further research of investigating the performances of other biological properties to train SN P system and other computing models. For example, biological networks [6366] and machine learning methods [6770] can be considered in this aspect.

In the SN P system, we use a simple Hebbian learning strategy to update the weights on synapses among neurons in predict module. There may be potential further research of designing complex learning strategy and involving the recently developed large-scale neural networks training algorithms, see, for example [71, 72], for the training task. It would be a quite interesting topic to find the inherent advantages of SN P systems comparing with some other models and methods.

Also, some variants of SN P systems, see, for example [31, 46, 73, 74], can be used to improve the performance of our method. The architecture of the SN P system in this work is designed from the biological observation of spiking a neural network from inside to outside. Specifically, the outer layer neurons are divided into four groups, and each neuron of the outer layer connects to some neurons in the outmost layer. As well, all of the outer layer neurons and the outmost layer neurons have been divided into four groups. At this moment, there is no theory to design SN P systems to do pattern recognition, but it would be an interesting topic for future research. Artificial intelligent models and algorithms have been used in solving problems in practice, see, for example [7579]. It is of interests to use SN P systems to solve some other real-life problems.