NSNAD: negative selection-based network anomaly detection approach with relevant feature subset

Belhadj aissa, Naila; Guerroumi, Mohamed; Derhab, Abdelouahid

doi:10.1007/s00521-019-04396-2

NSNAD: negative selection-based network anomaly detection approach with relevant feature subset

Original Article
Published: 06 August 2019

Volume 32, pages 3475–3501, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

NSNAD: negative selection-based network anomaly detection approach with relevant feature subset

Download PDF

Naila Belhadj aissa ORCID: orcid.org/0000-0003-3250-0803¹,
Mohamed Guerroumi¹ &
Abdelouahid Derhab²

666 Accesses
22 Citations
Explore all metrics

Abstract

Intrusion detection systems are one of the security tools widely deployed in network architectures in order to monitor, detect and eventually respond to any suspicious activity in the network. However, the constantly growing complexity of networks and the virulence of new attacks require more adaptive approaches for optimal responses. In this work, we propose a semi-supervised approach for network anomaly detection inspired from the biological negative selection process. Based on a reduced dataset with a filter/ranking feature selection technique, our algorithm, namely negative selection for network anomaly detection (NSNAD), generates a set of detectors and uses them to classify events as anomaly. Otherwise, they are matched against an Artificial Human Leukocyte Antigen in order to be classified as normal. The accuracy and the computational time of NSNAD are tested under three intrusion detection datasets: NSL-KDD, Kyoto2006+ and UNSW-NB15. We compare the performance of NSNAD against a fully supervised algorithm (Naïve Bayes), an unsupervised clustering algorithm (K-means) and a semi-supervised algorithm (One-class SVM) with respect to multiple accuracy metrics. We also compare the time incurred by each algorithm in training and classification stages.

A New Classification Process for Network Anomaly Detection Based on Negative Selection Mechanism

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

A Multi-level Correlation-Based Feature Selection for Intrusion Detection

Article 30 March 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, security threats, attacks and intrusions in network infrastructures have become one of the major causes of great losses and massive sensitive data leaks. A countless number of mechanisms are used to minimize, detect and counter these security issues.

Anomaly detection is one of the techniques proposed to ensure the integrity and the confidentiality of data. In general, anomaly/outlier detection can be seen as a normal/anomaly classification problem [32]. Several modern techniques exist in the literature addressing this issue using neural network [78], Bayesian network [17], clique clustering approaches [10, 38, 67], bandit clustering [26, 29, 35, 52, 55], support vector machine (SVM) [65], fuzzy logic [82], graph theory [37, 54, 69], decision tree [56], genetic programming [21, 34], artificial immune systems (AIS) [11, 16, 79, 80] and more.

The biological immune system (BIS) has several properties such as robustness, error tolerance, decentralization, recognition of foreigners, adaptive learning and memory which makes it a very complex and promising source of inspiration for several domains. Artificial immune system, which is the field that tries to mimic the complex mechanisms of the BIS, is the focus of much research since the early 1990s [22] to tackle with complex engineering problems. Theories and algorithms were proposed and exploited for pattern recognition, data mining, optimization, machine learning and anomaly detection to name only a few [3].

Indeed, anomaly detection approach [5, 8] in network security relies on building normal models or profiles and discovering variation/deviation from the model in the observed data. This process is strongly similar to the main objective of the biological immune system. Several models were proposed that imitate BIS mechanisms such as clonal selection, negative and positive selections and immune cells network [95].

In this paper, we propose a negative selection algorithm, namely Negative Selection for Network Anomaly Detection (NSNAD) which includes the following outlined contributions:

(a)
We propose a filter/ranking-based feature selection using the Coefficient of Variation. The advantage of this statistical metric is twofold: First, it is independent from the class, and second, it can be measured regardless of the attribute’s scale and unit.
(b)
Most of the previous works dealt with nominal attributes by coding them with iterative integers or binarizing them. The first approach depends on the categories’ order of the nominal feature, which means that different orderings will yield to different numerical values and thus a biased classification. The second requires more computational time and memory resources as each value will be represented by an additional binary attribute. In our work, we replace the nominal attributes by their occurrence probabilities when statistical operations are carried out (feature selection phase). Otherwise, we handle them as strings.
(c)
We noticed that traditional negative selection implementations, usually, generate detectors as random binary sequences with an R-chunk (r consecutive bytes) matching against binary strings representing the self. We consider, in our work, both the real and string representations of all attributes in the dataset. We randomly choose instances from the unlabeled training data to be detectors, and we validate them against self-data in each dimension. Unlabeled training data contain both normal and anomaly records.
(d)
Our detector radius is border-based, which means that every detector has its own range of activation and it corresponds to the distance between the detector and the border instances in self-data.
(e)
Furthermore, in order to identify new attacks, we optimize the detection phase with an additional verification against self-space. Inspired from the biological Human Leukocyte Antigen (HLA), we define an Artificial HLA as the volume of self-space and ensure that the incoming instance is not actually a “self” before classifying it into anomaly.
(f)
Finally, we evaluate our approach not only under NSL-KDD dataset, which has been for the last decades the most used benchmark for the test and evaluation of IDS, but also using two more up-to-date datasets: Kyoto2006+ and UNSW-NB15.

The remainder of this paper is organized as follows: Section 2 presents some background and related work regarding biological and artificial immune systems as well as feature selection techniques. Section 3 details each phase of our proposed algorithm. Section 4 presents an experimental design. The results, analysis and discussion are provided in Sect. 5. An extensive comparison between AIS-based intrusion detection techniques is given in Sect. 6. We finally draw some concluding remarks in Sect. 7.

2 Background and related works

In this section, we present some background on biological and artificial immune systems; we briefly explain the biological immune response and point out the artificial theories inspired from each step of this process. We also review some previous work done in the field of intrusion detection using artificial immune system (AIS) approaches. We discuss feature selection algorithms and their classification, and finally, we provide a brief comparison between our algorithm and other related work.

2.1 Biological and artificial immune systems

The biological immune system (BIS) responds to an intrusion or any pathogen^{Footnote 1} through two types of immunity: innate and adaptive immunity [85] (Fig. 1).

The innate immunity, also known as non-specific immunity, is considered as the first line of defence. It consists in four categories of barriers: (1) Anatomical, which includes the skin, the mucus, etc., (2) Physiological like the temperature and the pH, (3) Phagocytic such as the macrophages and the polymorph nuclear and (4) Inflammatory as an antibacterial activity. The phagocytic and cytotoxic cells, known as Natural Killer cells (NK), are the key agents that ensure the pathogen’s termination. If the innate immunity fails to destroy the pathogen, it triggers a specific immune response.

The adaptive immunity, on the other hand, is said to be specific because it only responds to a particular pathogen. Each pathogen caries a certain shaped protein called antigen. The lymphocytes recognize each cell of its own body as self through Human Leukocyte Antigen ($HLA_1$ and $HLA_2$). The first class of HLA ($HLA_1$) has a ubiquitous expression. It is expressed on the surface of all nucleated cells including the Antigen Presenting Cells (APC), whereas the second class: $HLA_2$ is expressed only by the Antigen Presenting Cells, such as dendritic cells, macrophage, and Lymphocytes B (LB) cells [70].

Any other antigen is flagged as foreign and has to be destroyed. To do so, the immune response produces killer lymphocytes (B and T) and antibodies to target this one particular foreign antigen as well as memory cells in order to enhance the immune response in case of a second exposure to the same pathogen [72].

The BIS is a very complex system, and its interaction and defence mechanisms are in constant discovery. Indeed, immune cells interactions during the specific immune response, their proliferation and their maturation process have led to the definition of the main artificial immune theories and models, namely clonal selection theory, negative and positive selection theories, immune network theory and Danger theory [23]. Moreover, BIS features have been a great source of inspiration for many researchers in several fields such as pattern recognition, optimization and anomaly detection. These features can be summarized as follows:

The self/nonself-discrimination During its maturation, the T cells precursor can either turn into LT4 or into LT8 through the positive selection process. Based on a survival signal delivered to the lymphocytes that can identify with a small affinity one of the HLA classes, it becomes an LT4 cell if it recognizes the HLA class II ($HLA_2$) or an LT8 cell if it recognizes a HLA class I ($HLA_1$).
Thereafter, the LT cells, which recognize “too well” the HLA paired with a self-peptide (self-reactive T cells), must be eliminated. This elimination (apoptosis) process is called negative selection and has been an inspiration for several classification models [24].
This maturation process leads to the generation of two types of mature LT cells (LT4 and LT8) capable of discriminating between the self-antigens, through HLA, and the foreign antigens during the immune response.
Immune response APCs are cells that present antigenic peptides on their surface along with $HLA_1$ or $HLA_2$ to recruit $LT_8$ or $LT_4$ cells, respectively. The $LT_8$ cells become LT cytotoxic, and the $LT_4$ cells become T-helpers. Those T-helpers are involved in the clonal selection process [77] and the maturation of LB cells to plasmocytes in order to produce antibodies.
Several algorithms were proposed for classification, clustering and pattern recognition that are inspired from the biological clonal selection [15].
Memorization and distribution Some immune cells become memory cells for a specific foreign antigen once an immune response is activated from its first exposure. They ensure quicker and more effective immune responses of the BIS without going through the recognition and affinity maturation process.

In addition to all the above cited characteristics, self-regulation, decentralized functioning, immune response adaptation, cell proliferation are some other properties that inspired several models as solutions to real and complex problems [12]. Table 1 summarizes the immunological concepts, the models inspired therefrom and their use in computational problems.

Table 1 Immunity-based computational models and specific immunological concepts [23]

NSNAD: negative selection-based network anomaly detection approach with relevant feature subset

Abstract

Similar content being viewed by others

A New Classification Process for Network Anomaly Detection Based on Negative Selection Mechanism

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

A Multi-level Correlation-Based Feature Selection for Intrusion Detection

Explore related subjects

1 Introduction

2 Background and related works

2.1 Biological and artificial immune systems

2.2 AIS in intrusion detection

2.3 Feature selection

2.3.1 Filter-based methods

2.3.2 Wrapper-based methods

2.4 Feature selection in intrusion detection

2.5 Comparison with related work

3 NSNAD description

3.1 Overall architecture

3.2 Feature selection

3.3 Detector set generation

3.4 Classification

4 Experimental design

4.1 Datasets

4.1.1 NSL-KDD

4.1.2 Kyoto2006+

4.1.3 UNSW-NB15

4.2 Input parameters

4.3 Evaluation metric

5 Results and analysis

5.1 Feature subset and normalization

5.2 True positive rate (TPR)/false positive rate (FPR)

5.3 ROC and AUC

5.4 Accuracy and precision results

5.5 F-measure results

5.6 Matthews correlation coefficient (MCC)

5.7 Computational time

5.8 Summary of results

6 Comparison with AIS-based intrusion detection techniques

7 Conclusion and future work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Feature description

Appendix: Feature description

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation