Keywords

1 Introduction

Network based IDS are the software employed within the networks at some deliberated point to analyze network circulation on the whole subnet. The traffic log is matched through the database of recognized attacks and if a spasm is spotted or security policy violation is detected, a signal is passed to the network supervisor. NIDS are classified as On-line NIDS and Off-line NIDS. On-line NIDS are those which are able to work with the real-time networks whereas the Off-line ones are those who works over the repository of data and analyze the data in such a way to identify the attacks and normal instance.

In recent trend, the main attention of researchers has been inclined towards the machine learning techniques and neural network techniques like Random Forest, Support Vector Machines, Naïve Bayes and Decision Trees [1]. These techniques have been achieving better and improved performance in detection accuracy for network1 intrusion detection system. Machine learning taxonomy has given a whole new meaning to the field of intrusion detection when used up to its potential [2, 3].

To address the improvement required in the field of intrusion detection this new strategy is proposed.

2 Background

This section provides the related material which is necessary to realize the stimuluses and the idea behind the anticipated work in this work.

2.1 Network Intrusion Detection System

Now a day dependence over the organizations that relies on gradually demanding application of information technology is increasing rapidly. Thus service provider software is more prone to vulnerabilities and the errors involved are economically high in cost to be solved. This scenario leads to the need and innovation of a strong network monitoring system which can deal with the following pertinent concerns:

  • Dimensionality of data: The dimensionality of stored as well as passing by data over network is increasing massively and will be continue to increase. According to the forecast made in [4] the amount of data will reach up to 44ZB by 2020. Deploying NIDS to deal with such big amount of data is a major challenge.

  • Reliability: To achieve desired levels of reliability in terms of accuracy, the existing techniques are somewhere lacking. Hence more granular datasets, more visualization of data is required to achieve more promising results.

  • Mélange: The present scenario is focusing on developing ensemble and customized protocols using various algorithms and network attributes. Consequently, identification of nefarious and normal behavior is becoming a cumbersome task.

  • Imbalanced datasets: This problem arises when datasets consist of such classes which has fewer or smaller number of instances. Due to it, NIDS becomes unable to precisely predict such classes and becomes more prone to errors.

2.2 Machine Learning

According to Wikipedia, machine learning is subclass of artificial intelligence in the domain of computer science that empowers the computers with the ability to “learn” the data by using the statistical techniques, without being explicitly programmed [5]. Therefore, machine learning is programming the computers to enhance a benchmark efficiency via past practice or stored data. Machine learning make uses of the philosophy of statistics to build up mathematical prototypes to make out a corollary from an illustration. Various example of machine learning applications is basket analysis using learning associations which says 70% of customers who buy bread also buy butter, classification problem in which two or more classes are present and by making use of machine learning algorithms the appropriate class of the instance is predicted, pattern recognition which consists of face recognition, medical diagnosis and speech recognition etc. [6]. Machine learning algorithms are divided as following types:

  • Supervised learning: the aim of this learning is to memorize the patterns or mapping of input to output whose labels or results are provided by the supervisor himself [7].

  • Unsupervised learning: in this type of learning no supervisor is present and only input is provided. Here, the goal is to discover the symmetries in the input. The concept of clustering is used here to make clusters of similar patterns [8].

  • Reinforcement learning: this learning selects an action out of sequence of actions and learns the policy which was being used by the sequence of actions to reach the goal. Here the aim is to learn the goodness of policies and generate a policy [6, 9].

3 Existing Work

In this section, the most recent prominent works has been discussed.

The goal NIDS using machine learning is to breed a minimal rule set to detect malicious actions deviating from past behaviors. There are quite a few existing workings in the field of Network IDS. The work by [8] propose a new method to Network intrusion detection and achieved a FNR = 1.15%, FPR = 0.09% and detection accuracy of 98.76% in comparison to another SVM based scheme they’ve achieved FPR = 4.2%, FNR = 7.77% and detection accuracy of 88.03%. [10] propose a machine of generating learning model for NIDS by comparing five machine learning based models and achieved detection accuracy of 99.4%. They’ve compared the results with reduced feature set and without reduced feature set. Moreover, one more comparison is made between 10 fold cross-validation results and percentage split results.

[11] propose a machine learning based approach using SVM with augmented features. They have implemented the marginal density ratios transformation method to obtain improved detection rate for SVM. The dataset used is NSL-KDD and the results shows the robust performance results. [12] proposes an IDS on the basis of performance comparison between SVM, RFC and ELM to resolve concerns of performance. The use of these techniques shows limitations of large datasets, huge traffic data and gives an efficient classification technique. [13] analyzed methods for management of datasets related to imbalacing and they concludes that minority classes are not capable for learning as compare to majority classes. [14] has discussed problems regarding learning with skewed class scatterings and effect of it over performance of classifiers. The analysis was conducted for artificial intelligence and computational intelligence and confirms the requirement of building efficient intrusion detection systems. In [23], analysis of artificial NN, decision tree, support vector machine, Bayesian networks and a self-organizing map has been done. Even though high and desirable results have been achieved using machine learning but still machine learning consists of some vulnerabilities, such as misclassification of network data due to poison learning. Such vulnerabilities in the system affect performance. So such problems of machine learning need to be addressed.

4 Classifier Used

In our proposed work, following five algorithms have been used on two different datasets i.e. KDD Cup’99 and NSL-KDD. 10-fold CV approach has been applied with the help of Scikit Learn.

  • Random Forest Classifier: these classifiers are from the family of ensemble or forest of decision trees. This family generally have low bias and high variance and are perfect contenders for ensemble method. The bootstrap aggregating or bagging technique is generally used in this classifier to achieve increased variance without altering the bias [15].

  • J48: it is a predictive learning technique which make predictions for the new instance on the basis of prior available information. It creates a decision tree using the values of available data [16].

  • Naïve Bayes: these classifiers belongs to the family of probabilistic classifier. It uses bayes rule of conditional probability. Naïve bayes observes each feature individually as well independently of other features contained by model [17].

  • SVM: these classifiers are best suited for multiclass classification problems for big datasets and one of the superfast machine learning classifier with low computational resources [18]. This family supports classification as well as regression.

  • BayesNet: These are the sub set of Bayesian networks with nominal attributes and no missing values [19].

5 Calculations

Related to most of the existing research, our proposed work was implemented using Python. All evaluation was performed using 64-bit Windows 10 Pro with an Intel® Core™ i5-8250 CPU @ 1.60 GHz 1.80 GHz with 8.00 GB RAM and an NVIDIA GeForce MX150 GPU. Two of the benchmark datasets of the domain of intrusion detection i.e. KDD Cup’99 as well as NSL-KDD datasets are used for performance evaluation.

The used metrics are as follows:

  • True Positive(TP) – those occurrences which are correctly categorized as an intrusion.

  • False Positive(FP) – those occurrences which are incorrectly categorized as an intrusion.

  • True Negative(TN) – those occurrences which are correctly categorized as normal.

  • False Negative(FN) – those occurrences which are incorrectly categorized as normal.

Performance of the proposed work is calculated by using the following measures:

$$ {\text{Accuracy}} = \frac{TP + TN}{TP + TN + FP + FN} $$
(1)

The measure of accuracy is appropriately identified instances to the total number of records.

$$ {\text{Precision}} = \frac{TP}{TP + FP} $$
(2)

The precision is the measure of correctly identified records to the incorrectly identified records.

$$ {\text{Recall}} = \frac{TP}{TP + FN} $$
(3)

The recall is the measure of correctly identified records to the number of missed records.

$$ {\text{F1}} - {\text{Score}} = 2*\frac{Precision*Recall}{Precision + Recall} $$
(4)

The F1-Score is the measure of harmonic mean of recall and precision.

5.1 Datasets

Two datasets have been used i.e. NSL-KDD and KDD-Cup’99. They are publically available benchmark datasets and have been massively used by the researchers of intrusion detection domain.

KDD Cup’99:

In 1998 MIT Lincoln Labs prepared the intrusion detection assessment program named as DARPA IDS evaluation program. The network log consisting of intrusions imitated in military network environment for survey purpose was conducted [20]. Later on, the KDD Cup’99 dataset utilized it. This dataset contains 4900000 number of records with 41 type of features (e.g. duration, flag, land) and these features are broadly classified into three main classes. As it is a labelled dataset so each record is labelled as normal or attack (attack type). Most of the researchers make use 10% subset of original dataset as working with it requires less computation. The dataset needs to be pre-processed before usage. The pre-processing consists of transformation of string or symbolic values to numeric values to make learning easier.

NSL-KDD:

The NSL-KDD is the improvement over KDD Cup’99 with reduced number of redundancy. The number of features is same as of KDD Cup’99 [20], [21]. Though this dataset has also faced criticism but still it is being used extensively world-wide. Whole of the dataset has been used for 5-class classification. Following are the various reason to use NSL-KDD (Table 1):

Table 1. NSL-KDD 5-Class Performance
  1. 1.

    Redundant records are not present in train dataset so classifier is free from producing biased results.

  2. 2.

    Test dataset is free from duplicate records which helps in better reduction rates.

6 Results and Discussions

Results obtained are indicating that out of all the classifier used, RFC is performing the best in terms of Accuracy, Precision, Recall and F1-Score. Moreover, one more analysis is made regarding the number of records available for the R2l and U2r class are less as compare to other classes so is the accuracy and other metrics is also low.

6.1 KDD Cup’99 Evaluation

This section provides the evaluations made on KDD Cup’99 dataset.

5-Class Classification:

5-Class classification consists of the standard 5 classes i.e. Normal, DoS, U2r, Probe, R2l. 10% subset of KDD Cup’99, which is a common practice, has been used. The results indicate that 2 out of 5 classes shows poor performance i.e. R2l and U2r. The rest of the classes offer significant level of accuracy, precision, recall and f1-score. Moreover, it can also be observed from the results that the overall performance of Random Forest Classifier is the best and SVM also outperforms whereas naïve bayes is the worst performer in terms of accuracy (Table 2).

Table 2. KDD-Cup’99 5-Class Performance

7 Conclusion and Future Work

This work has used the benchmark datasets KDD Cup’99 and NSL-KDD to make performance evaluations. The comparisons have made between 5-class classification of both the datasets. On comparison, we found that the RFC is performing the best in both scenarios. Moreover, it may also be noted that the classes like U2r and R2l are not giving very promising results because of the number of instances available for training. It suggests that efforts for refining the performance of present techniques for rare attack classes needs instant addressing by scholars. Moreover, the results obtained also suggests that for a particular attack class, some classifiers perform better than the others. The significant reason for that is different algorithms are designed differently to work with their particular characteristics.

In future work, the improvement will be made in the direction of dealing with class imbalancing problem. We will work upon improvement of existing evaluations by utilizing more efficient methods like shallow learning and deep learning. Hence we can extend the proposed work to achieve more and more merits out of it.