Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Sah, Gulab; Banerjee, Subhasish; Singh, Sweety

doi:10.1007/s10207-022-00616-4

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Regular contribution
Published: 06 October 2022

Volume 22, pages 1–27, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Information Security Aims and scope Submit manuscript

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Download PDF

1007 Accesses
9 Citations
Explore all metrics

Abstract

The intrusion detection system (IDS) plays an important role in extracting and analysing the network traffics to detect aberrant activity. However, emerging technologies, like cloud computing, Internet of Things, etc., generate a large volume of traffics, which may carry the irrelevant attributes that do not have any impact on classification or in detection of assaults. Hence, it’s became an open challenge for the researchers to extract the meaningful data from huge amounts of traffic and also to examine whether the selected features could increase IDS performance or not. To solve these issues, features selection approaches (FSA) have been used in this research to remove non-relevant features and find the important ones. Later, the various classifiers have been used to investigate the best classifier which could increase the performance of IDS’s detection-engine on the NSL-KDD datasets. However, to validate, the investigated best-performing classifier with the suitable features selection technique (FST) has also been implemented on a real-time dataset, i.e. combined CICIDS2017. The experiment results in this research suggest that the acquired subset of relevant features under the proposed model's (Decision Tree + Recursive Feature Elimination) could increase the IDS performance with average accuracy of 99.21% and 99.94% on the well-known NSL-KDD and CICIDS2017 datasets, respectively, and could also minimize the computation cost, in parallel.

A Hybrid-Based Feature Selection Approach for IDS

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

Diverse Analysis of Data Mining and Machine Learning Algorithms to Secure Computer Network

Article 02 December 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In modern society, network-based services are gaining more and more importance. As technology gets advances, such as IoTs, big data, clouds computing, etc., the vast volumes of the traffic are also increasing, rapidly. Therefore, as the network data traffic is growing substantially, updating the attack sign becomes more difficult, time-consuming, and tedious. Hence, as a result of wide internet use and rapid traffic growth, the network security became an emerging field of research among scientists and researchers. In this field, the researchers try to prevent the attackers or intruders who are always looking for finding the flaws in the network or in the system to obtain illegal access (s).

Many solutions exist today to secure a network environment, such as antivirus, firewalls, IDS, etc. However, the IDS is the most prominent mechanism among them to defend a network or individual system.

The IDS also protects the sensitive data during traveling over the networks from being intercepted by attackers or intruders. However, the existing IDS are still not extremely scalable or flexible enough. In the years 2014–2016, two data breaches were reported by Yahoo, impacting 500 million customers’ accounts and resulting in a loss of 350 million dollars [1]. The outbreaks are being hammered with the intention of snipping data with the help of intelligent and sophisticated algorithms. Several IDS have been developed since last few years, but determining whether a network is normal or aberrant is not an easy task. Therefore, the several algorithms of machine learning (ML) have been introduced and implemented to boost the intelligence of IDS in order to tackle the challenges [2]. Till date, many types of research have been undertaken which show that ML-based IDS performs better in terms of execution and implementation [3]. However, only a few models could combine low computation costs with good detection rates, at the same time.

Therefore, since network data traffic grows at a rapid rate, extracting significant and relevant information from this traffic is a difficult task which must be addressed properly and the computing cost also supposed to be considered, side by side [4].

Moreover, whether the selected features will either improve or not in the performance of IDS is also needed to investigate.

Hence, to minimize the computational cost, one possible solution is, identify and select only relevant features from the dataset that contribute in attacks detection. Thus, a reduction of dataset dimension leads to the requirement of less training time. Simultaneously, it can enhance the performance of the classifier in IDS [5]. Similarly, the other possible solution to minimize the computational cost could be, utilize only the cost-effective algorithms that takes low cost to learn the data [6], for example, K-nearest neighbors (KNN). Therefore, to minimize the computational cost and to increase the performance of IDS, the features selection approaches (FSA) have been used in this research to remove non-relevant features, and various classifiers have also been used to investigate the best-performing classifier for enhancing the performance of IDS. Table 1 provides a list of all the abbreviations used in this paper.

Table 1 Nomenclature

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Abstract

Similar content being viewed by others

A Hybrid-Based Feature Selection Approach for IDS

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

Diverse Analysis of Data Mining and Machine Learning Algorithms to Secure Computer Network

Explore related subjects

1 Introduction

1.1 Contributions

1.2 Organization

2 Literature survey

3 Proposed framework

3.1 Dataset

3.1.1 NSL-KDD dataset

3.1.2 Combined CICIDS 2017 dataset

3.2 Data pre-processing

3.2.1 One-hot- encoding

3.2.2 Splitting the datasets

3.2.3 Feature normalization

3.3 FSA

3.3.1 PCA

3.3.2 Univariate feature selection

3.3.3 Percentile method

3.3.4 RFE

3.4 Model building and evaluation

3.4.1 DT

3.4.2 NB

3.4.3 KNN

3.5 Analysis and selection phase

3.5.1 Evaluation metrics

3.5.1.1 Accuracy (A)

3.5.1.2 G-means (G_m)

3.5.1.3 Specificity (S)

3.5.1.4 Recall (sensitivity)

3.5.1.5 Precision (P)

3.5.1.6 F-measure (F_m)

3.5.1.7 Training time (T 1) in seconds (s)

3.5.1.8 Testing time (T 2) in seconds (s)

4 Experimental setup and results analysis

4.1 Results and discussion

4.1.1 Results analysis using NSL-KDD dataset

4.1.2 Results analysis using combined CICIDS2017 dataset

4.1.3 Comparison

5 Conclusion and future work

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix A: the classifiers parameters setting

1.2 Appendix B: the analysis of datasets generated at each phase of the experiment

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

3.5.1.7 Training time (T ₁) in seconds (s)

3.5.1.8 Testing time (T ₂) in seconds (s)