1 Introduction

Artificial immune systems (AIS), inspired by the natural immune systems, are an emerging kind of soft computing methods [1]. With the distinguishing features of pattern recognition, data analysis, and machine learning, the AIS have recently gained considerable research interest from different communities [24]. As an important partner of the AIS, negative selection algorithm (NSA) is based on the principles of maturation of T cells and self/nonself discrimination in the biological immune systems. It was developed by Forrest et al. [5] in 1994 for real-time detection of computer virus. During the past decade, the NSA has been widely applied in such promising engineering areas as anomaly detection [6], networks security [7], milling tool breakage detection [8], and aircraft fault detection [9]. Detectors of the original NSA are usually first generated in a random manner, and undergo the so-called ‘negative censoring’ process thereafter. Only the qualified detectors that do not match the self are selected and used to detect changes/anomaly (nonself) in the fresh input patterns. Unfortunately, with no capability of adaptation, these detectors are not well suited for dealing with real-world problems under the time-varying circumstances. To overcome this drawback as well as improve the performance of conventional NSA detectors, an adaptive neural networks-based NSA is proposed here, the corresponding training algorithm is derived, and its applications in fault diagnosis are also explored.

This paper is organized as follows. The essential principles of the regular NSA are briefly discussed in Sect. 2. In Sect. 3, the neural networks-based NSA with a unique learning algorithm is introduced. A general fault diagnosis framework using the new NSA is constructed in the following section. Simulations of two numerical examples, anomaly detection in Mackey–Glass time series and bearings fault diagnosis, are made in Sect.  5 to verify the proposed scheme. Finally, in Sect. 6, the paper is concluded with some remarks and conclusions.

2 Principles of negative selection algorithm

The natural immune system is an efficient self-defense system that can protect the human body from being affected by foreign antigens or pathogens. One of its important functions is pattern recognition and classification. In other words, the immune system is capable of distinguishing the self, i.e., normal cells, from the nonself, such as bacteria, viruses, and cancer cells. This capability is mainly achieved by two types of lymphocytes: B cells and T cells. Both the B cells and T cells are produced in the bone marrow. However, for the T cells, they must pass a negative selection procedure in the thymus thereafter. Only those that do not match the self proteins of the body will be released out, while the remaining T cells are eventually destroyed there. Such censoring of T cells actually can prevent the immune system from attacking the body’s own proteins.

Forrest et al. proposed the NSA to mimic the aforementioned self/nonself discrimination mechanism of the biological immune system. Their approach can be conceptually described as follows. A self data set containing all the representative self samples is first collected. Next, the candidate detectors in binary strings are randomly generated, and compared with the self set. Like the above negative selection of T cells, only those detectors that do not match any element of this set are retained. Finally, the qualified detectors can be used to detect the nonself or anomaly. It is obvious that effective generation of the detectors is pivotal in the NSA, which depends on a few factors, e.g., the size of the self set, the matching rule between the detectors and self samples, and the detector generation strategy [10]. Particularly, the form of the detectors plays a crucial role here. González compared the performances of three existing detector representation schemes including hyper-rectangles, fuzzy rules, and hyper-spheres in [11]. Unfortunately, conventional NSA detectors represented by either binary strings or real values usually are not adaptive [12, 13]. Therefore, in the next section, a novel neural networks-based NSA is proposed to handle this drawback.

3 Neural networks-based negative selection algorithm

In the new neural networks-based NSA, detectors are built on the structures of three-layer feedforward neural networks, as shown in Fig. 1. \({{\left[ {x_{1}, x_{2}, \ldots, x_{N}} \right]}}\) is the input vector, N is the number of inputs, and \({{\left[ {w_{1}, w_{2}, \ldots, w_{N}} \right]}}\) and \({{\left[ {v_{1}, v_{2}, \ldots, v_{N}} \right]}}\) are the connection weights between Layer 1 & Layer 2 and Layer 2 & Layer 3, respectively. More precisely, in Layer 1, the matching degree d i between x i of \({{\left[ {x_{1}, x_{2}, \ldots, x_{N}} \right]}}\) and w i of \({{\left[ {w_{1}, w_{2}, \ldots, w_{N}} \right]}}\) is calculated in every input node i:

$$d_{i} = {\left({x_{i} - w_{i}} \right)}^{{2}}, $$
(1)

where i =  1,2, ..., N. The hidden node outputs are weighted in Layer 2 by \({{\left[ {v_{1}, v_{2}, \ldots, v_{N}} \right]}}.\) Thus, the final single output of Layer 3, y, can be given:

$$y = {\sum\limits_{i = 1}^N {v_{i} {\rm f}(d_{i})}} = {\sum\limits_{i = 1}^N {v_{i} {\rm f}{\left[ {(x_{i} - w_{i})^{2}} \right]}}}.$$
(2)

f(·) is the node function of Layer 2. Similarly with the original NSA, y is compared with a preset threshold λ, and the detector matching error E is obtained:

$$E = y - \lambda. $$
(3)

If for any \({{\left[ {x_{1}, x_{2}, \ldots, x_{N}} \right]}}\) in all the training samples, E >  0, this detector does not match the self. It can, therefore, be used for detecting the nonself.

Fig. 1
figure 1

Neural networks-based negative selection algorithm (NSA) detectors

It should be emphasized in the neural networks-based NSA, both \({{\left[ {w_{1}, w_{2}, \ldots, w_{N}} \right]}}\) and \({{\left[ {v_{1}, v_{2}, \ldots, v_{N}} \right]}}\) are trainable. However, different from the normal gradient descent oriented Back-Propagation (BP) training method [14] that always tries to minimize E, a ‘positive’ learning algorithm is derived here for these two sets of weights. The goal of such training is actually to raise the matching error between the NSA detectors and self samples so that the qualified detectors can be generated. Figure 2 illustrates the principle of this training approach. Let \({{\mathbf{w}}}\) and \({{\mathbf{v}}}\) denote \({{\left[ {w_{1}, w_{2}, \ldots w_{N}} \right]}}\) and \({{\left[ {v_{1}, v_{2}, \ldots, v_{N}} \right]}},\) respectively. Starting from a given point \({({\mathbf{w}}^{*}, {\mathbf{v}}^{*})}\) in the weight-matching error \({(({\mathbf{w}},{\mathbf{v}}) - E({\mathbf{w}},{\mathbf{v}}))}\) space, in order to decrease \({E({\mathbf{w}},{\mathbf{v}})}\) by means of the regular BP learning, weights \({{\mathbf{w}}^{*} }\) and \({{\mathbf{v}}^{*} }\) must be changed by \({\Delta {\mathbf{w}}^{*} }\) and \({\Delta {\mathbf{v}}^{*} },\) which are proportional to the negative gradient value of \({E({\mathbf{w}}^{*}, {\mathbf{v}}^{*})}:\)

$$\Delta {\mathbf{w}}^{*} = - \eta \nabla _{{{\mathbf{w}}^{*}}} E({\mathbf{w}}^{*}, {\mathbf{v}}^{*}) = - \mu \frac{{\partial E({\mathbf{w}}^{*}, {\mathbf{v}}^{*})}}{{\partial {\mathbf{w}}^{*}}},$$
(4)
$$\Delta {\mathbf{v}}^{*} = - \eta \nabla _{{{\mathbf{v}}^{*}}} E({\mathbf{w}}^{*}, {\mathbf{v}}^{*}) = - \mu \frac{{\partial E({\mathbf{w}}^{*}, {\mathbf{v}}^{*})}}{{\partial {\mathbf{v}}^{*}}},$$
(5)

where μ is the learning rate. On the other hand, in the new training algorithm of the proposed neural networks-based NSA, \({\Delta {\mathbf{w}}^{*} }\) and \({\Delta {\mathbf{v}}^{*} }\) are chosen to indeed increase \({E({\mathbf{w}},{\mathbf{v}})}\) for appropriate detectors generation and tuning. This is accomplished by employing the positive gradient information, as the solid line in Fig. 2 shows, in the \({\Delta {\mathbf{w}}^{*} }\) and \({\Delta {\mathbf{v}}^{*} }\) calculation. Therefore, there are:

$$\Delta {\mathbf{w}}^{*} = \eta \nabla _{{{\mathbf{w}}^{*}}} E({\mathbf{w}}^{*}, {\mathbf{v}}^{*}) = \mu \frac{{\partial E({\mathbf{w}}^{*}, {\mathbf{v}}^{*})}}{{\partial {\mathbf{w}}^{*}}},$$
(6)
$$\Delta {\mathbf{v}}^{*} = \eta \nabla _{{{\mathbf{v}}^{*}}} E({\mathbf{w}}^{*}, {\mathbf{v}}^{*}) = \mu \frac{{\partial E({\mathbf{w}}^{*}, {\mathbf{v}}^{*})}}{{\partial {\mathbf{v}}^{*}}}.$$
(7)

w (k) i and v (k) i are denoted as the instant values of w i and v i at iteration k, respectively. They can be updated:

$$w^{{(k + 1)}}_{i} = w^{{(k)}}_{i} + \mu \frac{{\partial E}}{{\partial w^{{(k)}}_{i}}} = w^{{(k)}}_{i} + \mu \frac{{\partial y}}{{\partial w^{{(k)}}_{i}}} = w^{{(k)}}_{i} - 2\mu v^{{(k)}}_{i} {{\rm f}}^{\prime}{\left\{{{\left[ {x_{i} - w^{{(k)}}_{i}} \right]}^{{2}}} \right\}}{\left[ {x_{i} - w^{{(k)}}_{i}} \right]},$$
(8)
$$v^{{(k + 1)}}_{i} = v^{{(k)}}_{i} + \mu \frac{{\partial E}}{{\partial v^{{(k)}}_{i}}} = v^{{(k)}}_{i} + \mu \frac{{\partial y}}{{\partial v^{{(k)}}_{i}}} = v^{{(k)}}_{i} + \mu {\rm f}{\left\{{{\left[ {x_{i} - w^{{(k)}}_{i}} \right]}^{{2}}} \right\}}.$$
(9)

The above training algorithm has several remarkable advantages. Firstly, it provides a more effective detector generation scheme than the conventional solutions, in which the binary or real-valued detectors are initially generated in a stochastic manner. Secondly, if applied as a postprocessing phase, it can automatically fine-tune those rough detectors. Thirdly, embedded with the advantageous characteristics of adaptation, the neural networks-based NSA is capable of coping with the time-varying anomaly detection problems. The application of this new NSA in fault diagnosis will be demonstrated in Sect. 4.

Fig. 2
figure 2

Principle of training algorithm for neural networks-based NSA detectors (dotted line negative gradient, solid line positive gradient)

4 Fault diagnosis using neural networks-based negative selection algorithm

Fault diagnosis methods are crucial in modern industry to guarantee the normal working conditions of plants, such as electrical machines and motors [15]. The anomalies in the feature signals acquired from operating plants are considered to be caused by only incipient faults. Hence, fault diagnosis is converted to a problem of anomaly detection, i.e., self/nonself discrimination, in the characteristic time series, which can be solved by utilizing the new NSA. The proposed fault diagnosis scheme consists of four stages. Firstly, the feature signals from healthy as well as faulty plants are sampled and preprocessed. Secondly, a certain number of eligible detectors, S, are generated based on the negative selection principle. Only the feature signals of healthy plants are used at this step. Thirdly, these detectors are trained with the feature signals of both healthy and faulty plants. In addition to the learning algorithm presented in Sect. 3, a ‘margin’ training strategy is also developed for the neural networks-based detectors. The detector weights are separately adapted in the following two cases (regions), refer to Fig. 3.

  • Case 1 (for the faulty plant feature signals only): if 0 <  E <  γ (Training Region I in Fig. 3), the detectors are trained using the normal BP learning algorithm to decrease E; otherwise, no training is employed.

  • Case 2 (for the healthy plant feature signals only): if −γ <  E <  0 (Training Region II in Fig. 3), the detectors are trained using the above ‘positive’ BP learning algorithm to increase E; otherwise, no training is employed.

Fig. 3
figure 3

Margin training regions of neural networks-based NSA detectors in fault diagnosis

γ, the margin parameter, is a minor percentage of λ, e.g., 0.1 λ. Finally, after this weight refinement stage, the detectors can be deployed to detect possible anomalies, i.e., existing faults.

AC and DC motors have been intensively applied in various industrial applications. However, changing working environments and dynamical loading always strain and wear motors, and further cause some incipient faults, such as shorted turns, broken bearings, and damaged rotor bars. These faults can result in serious performance degradation and even eventual system failures, if they are not properly detected as well as handled. Therefore, motor drive monitoring and fault diagnosis are very important and challenging topics in the electrical engineering field [16].

The proposed fault diagnosis method is indeed independent of plants and faults, and can, thus, be a general-purpose solution to a large variety of fault diagnosis problems. Figure 4 shows the structure of the above neural networks-based NSA in motor fault diagnosis, in which there are two main phases involved: detector training phase and fault detection phase. In the detector training phase, the feature signals of both healthy and faulty motors are first collected beforehand. They are split into non-overlapping windows in the signal preprocessing unit, denoted by \({{\left[ {x_{1}, x_{2}, \ldots x_{N}} \right]}}\) in Fig. 1, as the input patterns of the NSA detectors. With the negative selection approach and detector training algorithms, the S qualified detectors are next generated. In the fault detection phase, these detectors are used to detect any anomaly in the feature signals acquired from the operating motors. Fault diagnosis results can be obtained based on the statistics of activated detectors. A detector is considered activated, if E <  0. For the motor time series under examination, the numbers of both correctly activated detectors and incorrectly activated detectors, M and L, are counted. The quantitative performance criterion, fault detection rate η, is defined:

$$\eta = \frac{M}{{M + L}} \times 100\% .$$
(10)

In the next section, the effectiveness of this new neural networks-based NSA and motor fault diagnosis scheme is verified using computer simulations.

Fig. 4
figure 4

Neural networks-based NSA in motor fault diagnosis

5 Simulations

Two representative testbeds, anomaly detection in chaotic time series and inner raceway fault detection of motor bearings, are employed in the simulations.

5.1 Anomaly detection in chaotic time series

As a popular benchmark for intelligent data analysis methods, the Mackey–Glass chaotic time series, x(t), can be generated by the following nonlinear differential equation [17]:

$$\dot{x}(t) = \frac{{ax(t -\tau)}}{{1 + x^{c} (t - \tau)}} - bx(t),$$
(11)

where a =  0.1, b =  0.2, and c =  10. Figure 5 a and b illustrate two different Mackey–Glass time series with τ =  30 and τ =  17, respectively. Suppose the normal Mackey–Glass time series result from τ =  30. The goal of utilizing the neural networks-based NSA here is to detect the anomaly caused by τ =  17. Hence, as in the detector training steps in Sect. 4, two 500-sample data sets are collected from the cases of τ =  30 and τ =  17, respectively, and cascaded together as one compact training set of 1,000 samples. The related NSA parameters are given as follows:

  • number of detectors: S =  100,

  • window width: N =  10,

  • matching threshold: λ =  5.25,

  • margin coefficient: γ =  0.05λ =  0.26.

Fig. 5
figure 5

Mackey–Glass time series (a τ =  30, b τ =  17, c fresh anomalous data for verification)

To validate the neural networks-based NSA, the fresh anomalous Mackey–Glass time series shown in Fig. 5c are used, where those samples from 1 to 500 correspond to τ =  30, and the succeeding 500 samples τ =  17. The anomaly detection results acquired from the untrained (before training) and trained (after training) NSA detectors are demonstrated in Fig. 6a and b, respectively. There are only 100 detector instants illustrated in Fig. 6, since the window width N is 10. Note that the untrained detectors are randomly initialized and generated from the normal Mackey–Glass time series, i.e., the first half of the 1,000-sample training data set. Apparently, before training, there are:

$$M = 57,\enspace L = 9, \quad \hbox{and} \quad \eta = \frac{M}{{M + L}} \times 100\% = 86\%, $$

and after training there are:

$$M = 31, \enspace L = 1, \quad \hbox{and} \quad \eta = \frac{M}{{M + L}} \times 100\% = 97\% .$$

Although the proposed detector adaptation algorithms decrease the numbers of both incorrectly and correctly activated detectors, η actually has grown from 86 to 97%. In other words, an improved anomaly detection performance can be achieved by using this neural networks-based NSA.

Fig. 6
figure 6

Anomaly detection in Mackey–Glass time series (a before detector training, b after detector training)

5.2 Inner raceway fault detection of motor bearings

Bearings are indispensable components in rotating machinery. Therefore, appropriate monitoring of their conditions is crucial to ensure the normal operating status of motors [18]. Since defect on the inner raceway is a typical bearings fault, the proposed neural networks-based NSA is examined using this fault detection problem. The bearings feature signals are collected at the sampling frequency of 20 kHz from a vibration sensor mounted on top of the NYLA-K eight-ball bearings. The model of the vibration sensor used is IMI Sensors 601A01. The motor is a three-phase industrial motor of 0.5 horsepower manufactured by the Baldor Electric Company, which has the rotation speed at 1,782 rpm.

Figure 7a and b show the vibration signals from the healthy bearings and those bearings with the existing inner raceway fault, respectively. In the preprocessing unit, for the convenient manipulation with detector matching error E in (3), a small constant 0.08 is added to all the signal samples so that both the training and testing data sets contain only positive values. The parameters of the neural networks-based NSA are:

  • number of detectors: S =  100,

  • window width: N =  5,

  • matching threshold: λ =  4.6,

  • margin coefficient: γ =  0.05λ =  0.23.

Fig. 7
figure 7

Features signals of healthy and faulty bearings (a healthy bearings, b bearings with inner raceway fault)

The relationship between the fault detection rate η and training epochs is depicted in Fig. 8. Fresh feature signals from faulty bearings are deployed as the verification data. It is clearly visible that η increases with moderate oscillations, when the number of training epochs grows. These oscillations are due to the nature of the gradient descent principle. In the simulations, η is only 79% (M =  27 and L =  7) at the beginning of training, but finally reaches 100% (M =  15 and L =  0) after about 800 iteration steps. This indeed demonstrates the neural networks-based NSA can significantly improve the fault detection rate. Nevertheless, the inner raceway fault detection problem is used here only as a simplified testbed, and many practical details are not considered.

Fig. 8
figure 8

Fault detection rate η of neural networks-based NSA

6 Conclusions

In this paper, a neural networks-based NSA is first introduced. Its novel structure is described, and the corresponding learning algorithm is also derived. The applications of the new NSA in fault diagnosis are next discussed. Both simulated and real-world data is employed for the validation of the proposed scheme. Improved anomaly detection and bearings fault detection performances are achieved in numerical simulations. The optimization of those NSA detectors with the clonal selection method is at present under investigation [19]. Furthermore, study of time-varying fault diagnosis problems using this approach will be another interesting topic.