Pragmatic Analysis of Machine Learning Techniques in Network Based IDS

Nehra, Divya; Kumar, Krishan; Mangat, Veenu

doi:10.1007/978-981-15-0108-1_39

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1075))

Included in the following conference series:

International Conference on Advanced Informatics for Computing Research

674 Accesses
2 Citations

Abstract

In providing defense to computer networks the network intrusion detection system (NIDS) plays a very essential role. To cope up with the demands of contemporary networks various concerns like performance evaluation and others related to the networks should be taken under consideration. Proposed work presents a pragmatic analysis of machine learning techniques for network based IDS. The performance analysis over two benchmark datasets i.e. KDD-Cup’99 and NSL-KDD by using five supervised machine learning techniques (RFC, Naïve bayes, J48, Bayes Net and SVM) has been prepared. To assess the performance network based intrusion detection system various metrics such as accuracy, recall, F1-score and precision has been computed and analyzed. Therefore, the summary of the work suggests that no single technique is smart enough to identify all attack classes to conventional levels. Most of the techniques provided poor results for minority attack class(es). To estimate and assess the supervised classifier a blind set of investigation with 10-fold cross validation has been performed. The results achieved are promising and provides a new direction to researchers of the intrusion detection domain.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Review on Network Intrusion Detection System Using Machine Learning

Intrusion Detection System for Securing Computer Networks Using Machine Learning: A Literature Review

A Comparative Study of Machine Learning Classifiers for Network Intrusion Detection

Keywords

1 Introduction

Network based IDS are the software employed within the networks at some deliberated point to analyze network circulation on the whole subnet. The traffic log is matched through the database of recognized attacks and if a spasm is spotted or security policy violation is detected, a signal is passed to the network supervisor. NIDS are classified as On-line NIDS and Off-line NIDS. On-line NIDS are those which are able to work with the real-time networks whereas the Off-line ones are those who works over the repository of data and analyze the data in such a way to identify the attacks and normal instance.

In recent trend, the main attention of researchers has been inclined towards the machine learning techniques and neural network techniques like Random Forest, Support Vector Machines, Naïve Bayes and Decision Trees [1]. These techniques have been achieving better and improved performance in detection accuracy for network1 intrusion detection system. Machine learning taxonomy has given a whole new meaning to the field of intrusion detection when used up to its potential [2, 3].

To address the improvement required in the field of intrusion detection this new strategy is proposed.

2 Background

This section provides the related material which is necessary to realize the stimuluses and the idea behind the anticipated work in this work.

2.1 Network Intrusion Detection System

Now a day dependence over the organizations that relies on gradually demanding application of information technology is increasing rapidly. Thus service provider software is more prone to vulnerabilities and the errors involved are economically high in cost to be solved. This scenario leads to the need and innovation of a strong network monitoring system which can deal with the following pertinent concerns:

Dimensionality of data: The dimensionality of stored as well as passing by data over network is increasing massively and will be continue to increase. According to the forecast made in [4] the amount of data will reach up to 44ZB by 2020. Deploying NIDS to deal with such big amount of data is a major challenge.
Reliability: To achieve desired levels of reliability in terms of accuracy, the existing techniques are somewhere lacking. Hence more granular datasets, more visualization of data is required to achieve more promising results.
Mélange: The present scenario is focusing on developing ensemble and customized protocols using various algorithms and network attributes. Consequently, identification of nefarious and normal behavior is becoming a cumbersome task.
Imbalanced datasets: This problem arises when datasets consist of such classes which has fewer or smaller number of instances. Due to it, NIDS becomes unable to precisely predict such classes and becomes more prone to errors.

2.2 Machine Learning

According to Wikipedia, machine learning is subclass of artificial intelligence in the domain of computer science that empowers the computers with the ability to “learn” the data by using the statistical techniques, without being explicitly programmed [5]. Therefore, machine learning is programming the computers to enhance a benchmark efficiency via past practice or stored data. Machine learning make uses of the philosophy of statistics to build up mathematical prototypes to make out a corollary from an illustration. Various example of machine learning applications is basket analysis using learning associations which says 70% of customers who buy bread also buy butter, classification problem in which two or more classes are present and by making use of machine learning algorithms the appropriate class of the instance is predicted, pattern recognition which consists of face recognition, medical diagnosis and speech recognition etc. [6]. Machine learning algorithms are divided as following types:

Supervised learning: the aim of this learning is to memorize the patterns or mapping of input to output whose labels or results are provided by the supervisor himself [7].
Unsupervised learning: in this type of learning no supervisor is present and only input is provided. Here, the goal is to discover the symmetries in the input. The concept of clustering is used here to make clusters of similar patterns [8].
Reinforcement learning: this learning selects an action out of sequence of actions and learns the policy which was being used by the sequence of actions to reach the goal. Here the aim is to learn the goodness of policies and generate a policy [6, 9].

3 Existing Work

In this section, the most recent prominent works has been discussed.

The goal NIDS using machine learning is to breed a minimal rule set to detect malicious actions deviating from past behaviors. There are quite a few existing workings in the field of Network IDS. The work by [8] propose a new method to Network intrusion detection and achieved a FNR = 1.15%, FPR = 0.09% and detection accuracy of 98.76% in comparison to another SVM based scheme they’ve achieved FPR = 4.2%, FNR = 7.77% and detection accuracy of 88.03%. [10] propose a machine of generating learning model for NIDS by comparing five machine learning based models and achieved detection accuracy of 99.4%. They’ve compared the results with reduced feature set and without reduced feature set. Moreover, one more comparison is made between 10 fold cross-validation results and percentage split results.

[11] propose a machine learning based approach using SVM with augmented features. They have implemented the marginal density ratios transformation method to obtain improved detection rate for SVM. The dataset used is NSL-KDD and the results shows the robust performance results. [12] proposes an IDS on the basis of performance comparison between SVM, RFC and ELM to resolve concerns of performance. The use of these techniques shows limitations of large datasets, huge traffic data and gives an efficient classification technique. [13] analyzed methods for management of datasets related to imbalacing and they concludes that minority classes are not capable for learning as compare to majority classes. [14] has discussed problems regarding learning with skewed class scatterings and effect of it over performance of classifiers. The analysis was conducted for artificial intelligence and computational intelligence and confirms the requirement of building efficient intrusion detection systems. In [23], analysis of artificial NN, decision tree, support vector machine, Bayesian networks and a self-organizing map has been done. Even though high and desirable results have been achieved using machine learning but still machine learning consists of some vulnerabilities, such as misclassification of network data due to poison learning. Such vulnerabilities in the system affect performance. So such problems of machine learning need to be addressed.

4 Classifier Used

In our proposed work, following five algorithms have been used on two different datasets i.e. KDD Cup’99 and NSL-KDD. 10-fold CV approach has been applied with the help of Scikit Learn.

Random Forest Classifier: these classifiers are from the family of ensemble or forest of decision trees. This family generally have low bias and high variance and are perfect contenders for ensemble method. The bootstrap aggregating or bagging technique is generally used in this classifier to achieve increased variance without altering the bias [15].
J48: it is a predictive learning technique which make predictions for the new instance on the basis of prior available information. It creates a decision tree using the values of available data [16].
Naïve Bayes: these classifiers belongs to the family of probabilistic classifier. It uses bayes rule of conditional probability. Naïve bayes observes each feature individually as well independently of other features contained by model [17].
SVM: these classifiers are best suited for multiclass classification problems for big datasets and one of the superfast machine learning classifier with low computational resources [18]. This family supports classification as well as regression.
BayesNet: These are the sub set of Bayesian networks with nominal attributes and no missing values [19].

5 Calculations

Related to most of the existing research, our proposed work was implemented using Python. All evaluation was performed using 64-bit Windows 10 Pro with an Intel® Core™ i5-8250 CPU @ 1.60 GHz 1.80 GHz with 8.00 GB RAM and an NVIDIA GeForce MX150 GPU. Two of the benchmark datasets of the domain of intrusion detection i.e. KDD Cup’99 as well as NSL-KDD datasets are used for performance evaluation.

The used metrics are as follows:

True Positive(TP) – those occurrences which are correctly categorized as an intrusion.
False Positive(FP) – those occurrences which are incorrectly categorized as an intrusion.
True Negative(TN) – those occurrences which are correctly categorized as normal.
False Negative(FN) – those occurrences which are incorrectly categorized as normal.

Performance of the proposed work is calculated by using the following measures:

$$ {\text{Accuracy}} = \frac{TP + TN}{TP + TN + FP + FN} $$

(1)

The measure of accuracy is appropriately identified instances to the total number of records.

$$ {\text{Precision}} = \frac{TP}{TP + FP} $$

(2)

The precision is the measure of correctly identified records to the incorrectly identified records.

$$ {\text{Recall}} = \frac{TP}{TP + FN} $$

(3)

The recall is the measure of correctly identified records to the number of missed records.

$$ {\text{F1}} - {\text{Score}} = 2*\frac{Precision*Recall}{Precision + Recall} $$

(4)

The F1-Score is the measure of harmonic mean of recall and precision.

5.1 Datasets

Two datasets have been used i.e. NSL-KDD and KDD-Cup’99. They are publically available benchmark datasets and have been massively used by the researchers of intrusion detection domain.

KDD Cup’99:

In 1998 MIT Lincoln Labs prepared the intrusion detection assessment program named as DARPA IDS evaluation program. The network log consisting of intrusions imitated in military network environment for survey purpose was conducted [20]. Later on, the KDD Cup’99 dataset utilized it. This dataset contains 4900000 number of records with 41 type of features (e.g. duration, flag, land) and these features are broadly classified into three main classes. As it is a labelled dataset so each record is labelled as normal or attack (attack type). Most of the researchers make use 10% subset of original dataset as working with it requires less computation. The dataset needs to be pre-processed before usage. The pre-processing consists of transformation of string or symbolic values to numeric values to make learning easier.

NSL-KDD:

The NSL-KDD is the improvement over KDD Cup’99 with reduced number of redundancy. The number of features is same as of KDD Cup’99 [20], [21]. Though this dataset has also faced criticism but still it is being used extensively world-wide. Whole of the dataset has been used for 5-class classification. Following are the various reason to use NSL-KDD (Table 1):

Table 1. NSL-KDD 5-Class Performance

Full size table

1.
Redundant records are not present in train dataset so classifier is free from producing biased results.
2.
Test dataset is free from duplicate records which helps in better reduction rates.

6 Results and Discussions

Results obtained are indicating that out of all the classifier used, RFC is performing the best in terms of Accuracy, Precision, Recall and F1-Score. Moreover, one more analysis is made regarding the number of records available for the R2l and U2r class are less as compare to other classes so is the accuracy and other metrics is also low.

6.1 KDD Cup’99 Evaluation

This section provides the evaluations made on KDD Cup’99 dataset.

5-Class Classification:

5-Class classification consists of the standard 5 classes i.e. Normal, DoS, U2r, Probe, R2l. 10% subset of KDD Cup’99, which is a common practice, has been used. The results indicate that 2 out of 5 classes shows poor performance i.e. R2l and U2r. The rest of the classes offer significant level of accuracy, precision, recall and f1-score. Moreover, it can also be observed from the results that the overall performance of Random Forest Classifier is the best and SVM also outperforms whereas naïve bayes is the worst performer in terms of accuracy (Table 2).

Table 2. KDD-Cup’99 5-Class Performance

Full size table

7 Conclusion and Future Work

This work has used the benchmark datasets KDD Cup’99 and NSL-KDD to make performance evaluations. The comparisons have made between 5-class classification of both the datasets. On comparison, we found that the RFC is performing the best in both scenarios. Moreover, it may also be noted that the classes like U2r and R2l are not giving very promising results because of the number of instances available for training. It suggests that efforts for refining the performance of present techniques for rare attack classes needs instant addressing by scholars. Moreover, the results obtained also suggests that for a particular attack class, some classifiers perform better than the others. The significant reason for that is different algorithms are designed differently to work with their particular characteristics.

In future work, the improvement will be made in the direction of dealing with class imbalancing problem. We will work upon improvement of existing evaluations by utilizing more efficient methods like shallow learning and deep learning. Hence we can extend the proposed work to achieve more and more merits out of it.

References

Dong, B., Wang, X.: Comparison deep learning method to traditional methods using for network intrusion detection. In: 8th IEEE International Conference Communication Software Networks, pp. 581–585 (2016)
Google Scholar
Axelsson, S.: Intrusion detection systems: a survey and taxonomy. Tech. Rep. 99, 1–15 (2000)
Google Scholar
Hodo, E., Bellekens, X., Hamilton, A., Tachtatzis, C., Atkinson, R.: Shallow and deep networks intrusion detection system: a taxonomy and survey. CoRR, abs/1701.0, pp. 1–43 (2017)
Google Scholar
Executive summary: Data growth, business opportunities, and the IT imperatives—The digital universe of opportunities: Rich data and the increasing value of the Internet of Things. https://www.emc.com/%0Aleadership/digital-universe/2014iview/executive-summary.htm
Machine learning. https://en.wikipedia.org/wiki/Machine_learning
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)
MATH Google Scholar
Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection. In: Proceedings of the IEEE Symposium Security Private, pp. 305–316 (2010)
Google Scholar
Chowdhury, M.N., Ferens, K., Ferens, M.: Network intrusion detection using machine learning, pp. 30–35 (2010)
Google Scholar
Alpaydın, E.: Introduction to machine learning. Methods Mol. Biol. 1107, 105–128 (2014)
Article Google Scholar
Kumar, S., Viinikainen, A., Hamalainen, T.: Machine learning classification model for network based intrusion detection system. In: 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST), pp. 242–249 (2016)
Google Scholar
Wang, H., Gu, J., Wang, S.: An effective intrusion detection framework based on SVM with feature augmentation. Knowl.-Based Syst. 136, 130–139 (2017)
Article Google Scholar
Ahmad, I., Basheri, M., Iqbal, M.J., Rahim, A.: Performance comparison of support vector machine random forest and extreme learning machine for intrusion detection. IEEE Access 6, 33789–33795 (2018)
Article Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. In: GESTS International Conference on Computer Science and Engineering, vol. 30, pp. 25–36 (2006)
Google Scholar
Monard, M.C., Batista, G.E.A.P.A.: Learning with skewed class distribution. In: Advances in Logic, Artificial Intelligence and Robotics, Sao Paulo, SP, pp. 173–180. IOS Press (2002)
Google Scholar
Random Forest Classifier. https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf. Accessed 19 April 2018
Sahu, S.: Network intrusion detection system using J48 decision tree. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2023–2026 (2015)
Google Scholar
Belavagi, M.C., Muniyal, B.: Performance evaluation of supervised machine learning algorithms for intrusion detection. Procedia Comput. Sci. 89, 117–123 (2016)
Article Google Scholar
Li, Y., Xia, J., Zhang, S., Yan, J., Ai, X., Dai, K.: An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Syst. Appl. 39(1), 424–430 (2012)
Article Google Scholar
Kumar, G.: AI based supervised classifiers : an analysis for intrusion detection. In: Proceedings of the International Conference on Advances in Computing and Artificial Intelligence, pp. 170–174 (2011)
Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. IEEE Symp. Comput. Intell. Secur. Def. Appl. CISDA, 1–6 (2009)
Google Scholar
Dhanabal, L., Shantharajah, S.P.: A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 4(6), 446–452 (2015)
Google Scholar
Zamani, M., Movahedi, M.: Machine learning techniques for intrusion detection. Comput. Sci. 2, 1–11 (2015)
Google Scholar
Sharma, R.K., Kalita, H.K., Borah, P.: Analysis of machine learning techniques based intrusion detection systems á supervised learning. In: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, vol. 44, pp. 485–493 (2016)
Google Scholar
Chowdhury, M.N., Ferens, K., Ferens, M.: Network intrusion detection using machine learning. Int. Conf. Secur. Manag. 4, 30–35 (2010)
Google Scholar
Hamid, Y.: Machine learning techniques for intrusion detection : a comparative analysis. In: ICIA, vol. 7, pp. 0–5. ACM (2016)
Google Scholar
Ambusaidi, M., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 65(10), 2986–2998 (2016)
Article MathSciNet Google Scholar
Singh, R., Kumar, H., Singla, R.K.: An intrusion detection system using network traffic profiling and online sequential extreme learning machine. Expert Syst. Appl. 42(22), 8609–8624 (2015)
Article Google Scholar
Proti, D.D.: Review of KDD cup, NSL-KDD and kyoto, datasets. Mil. Tech. Cour. 66, 580–596 (2006)
Google Scholar
Angelo, P., Resende, A., Drummond, A.C.: A survey of random forest based methods for intrusion detection systems. ACM Comput. Surv. 51(3), 52:1–52:27 (2018)
Google Scholar
Devaraju, S., Ramakrishnan, S.: Performance analysis of intrusion detection system using various neural network classifiers. Int. Conf. Recent Trends Inf. Technol. ICRTIT 2011, 1033–1038 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Divya Nehra, Krishan Kumar & Veenu Mangat

Authors

Divya Nehra
View author publications
You can also search for this author in PubMed Google Scholar
Krishan Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Veenu Mangat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Divya Nehra .

Editor information

Editors and Affiliations

Papua New Guinea University of Technology, Lae, Papua New Guinea
Ashish Kumar Luhach
Computer Science Department, Namibia University of Science and Technology, Windhoek, Namibia
Dharm Singh Jat
Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
Kamarul Bin Ghazali Hawari
School of Computing, University of Eastern Finland, Kuopio, Finland
Xiao-Zhi Gao
Department of Mathematics and Computing Science, Saint Mary's University, Halifax, Canada
Pawan Lingras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nehra, D., Kumar, K., Mangat, V. (2019). Pragmatic Analysis of Machine Learning Techniques in Network Based IDS. In: Luhach, A., Jat, D., Hawari, K., Gao, XZ., Lingras, P. (eds) Advanced Informatics for Computing Research. ICAICR 2019. Communications in Computer and Information Science, vol 1075. Springer, Singapore. https://doi.org/10.1007/978-981-15-0108-1_39

Download citation

DOI: https://doi.org/10.1007/978-981-15-0108-1_39
Published: 17 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0107-4
Online ISBN: 978-981-15-0108-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Pragmatic Analysis of Machine Learning Techniques in Network Based IDS

Abstract

Similar content being viewed by others

A Review on Network Intrusion Detection System Using Machine Learning

Intrusion Detection System for Securing Computer Networks Using Machine Learning: A Literature Review

A Comparative Study of Machine Learning Classifiers for Network Intrusion Detection

Keywords

1 Introduction

2 Background

2.1 Network Intrusion Detection System

2.2 Machine Learning

3 Existing Work

4 Classifier Used

5 Calculations

5.1 Datasets

KDD Cup’99:

NSL-KDD:

6 Results and Discussions

6.1 KDD Cup’99 Evaluation

5-Class Classification:

7 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pragmatic Analysis of Machine Learning Techniques in Network Based IDS

Abstract

Similar content being viewed by others

A Review on Network Intrusion Detection System Using Machine Learning

Intrusion Detection System for Securing Computer Networks Using Machine Learning: A Literature Review

A Comparative Study of Machine Learning Classifiers for Network Intrusion Detection

Keywords

1 Introduction

2 Background

2.1 Network Intrusion Detection System

2.2 Machine Learning

3 Existing Work

4 Classifier Used

5 Calculations

5.1 Datasets

KDD Cup’99:

NSL-KDD:

6 Results and Discussions

6.1 KDD Cup’99 Evaluation

5-Class Classification:

7 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation