Abstract
As the Internet is expanding, the threat of intrusion is also increasing. Modern businesses which use Internet demand for strong computer and network security. Intrusion prevention is obviously the best choice from security viewpoint, but it has practical limitations as hackers develop new methods to breach security. Thus early detection of an intrusion is the sensible option and Network Intrusion Detection Systems (NIDS) carry out that task. In this paper, the authors propose extended empirical evaluation model specifically from NIDS perspective. The objective is to introduce multiple classifiers into the model along with feature selection method for improving performance of the classifiers. Additionally, the new model incorporates a feedback mechanism to ensure new prediction learn from rectifications of past records.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The primary objective of Network Intrusion Detection Systems (NIDSs) is continuous examination of the network dataflow and identification of the malicious activity. When any doubtful activity is observed, it raises an alert to the (network or system) administrator. As mentioned by Biggio (2010) in [5], the main goal of NIDS is to differentiate between legitimate and intrusive network traffic. Intrusion detection has two approaches- (i) Misuse detection and (ii) Anomaly detection. In Misuse detection approach, the network traffic is compared with different kind of features of “known” attacks [14], whereas Anomaly detection approach, instead of looking for an exact match, looks for irregularity among other regular patterns. In this, a training set is created, and updated at regular time intervals to observe the changes in normal traffic during operation [14]. Though this approach resolves the limitation of Misuse detection, it could still fail if the hacker wisely constructs network packets which form wrong learning patterns for intrusion detection and sends over the network.
NIDS should be able to handle the massive volume of data, so the pattern classification techniques have been focused by researchers. As mentioned by Duda [3], the primary goal in pattern classification is to hypothesize the class of given models, process the identified data to eliminate noise, and for any identified pattern choose the model that fits the best. In simple words, it classifies the given set of patterns into a set of labels by analyzing the attributes (features) associated with the patterns.
Each packet entering the NIDS would be analyzed by the pattern classifier. A net-work packet contains number of attributes. However, not all attributes are much significant in deciding whether the packet is legitimate or malicious. So, if NIDS considers only important attributes which impact in deciding the category without adverse effect on result, then it would certainly reduce processing time. Feature Selection is the process of recognizing and eliminating the irrelevant and redundant features. Its advantages are [4] - (i) irrelevant features do not impact significantly on the accuracy of prediction, and (ii) redundant features do not improve of the result of prediction.
In this paper, the authors discuss on the study of classification model of attacks, classifier evaluation models, pattern classifiers for intrusion detection and feature selection methods in Sect. 2. Section 3 covers new systems implementation details, such as architecture, algorithms used. Then Sect. 4 presents test results observed so far. Finally it is ended with the conclusion in Sect. 5.
2 Related Work
This section covers study of previous research papers related to pattern classifiers and feature selection. The brief review of existing related work is discussed along with the key points taken up for this enhancement.
2.1 Empirical Performance Evaluation Model
The empirical performance evaluation model [1] was devised by Biggio et al. (2013) to resolve the limitations of classical model. The classical methods such as k-fold or bootstrapping [1] assume that the data distributed in training dataset will appear during the actual operation as well. In case of causative attacks [6], this could be useful, because the pattern during attack will generally be same as in the training dataset. But in case of other types of attack, it fails significantly. The pattern distribution during the attack is practically different than in training dataset.
To make the classifier ready before the occurrence of attack, the empirical performance evaluation model proposed an algorithm for forming the training set and testing set. As mentioned in [1], the empirical model has been tested for Indiscriminate Causative Integrity attack which permits at least one intrusion [6]. The empirical model used the testing strategy of increasing the number of malicious records injected into the training set and observed that the classifier performs correctly when number of malicious samples are less than or equal to legitimate samples.
However, it is a generalized model and only one kind of classifier (SVM) was tested for NIDS application.
2.2 Choosing a Pattern Classifier
Among many pattern classification techniques, the authors want to select a few, important ones which can be used in the new model. SVM pattern classifier was chosen as it was already referred in empirical evaluation model. Then the authors referred recent papers to understand the trend of selecting pattern classifiers by the re-searchers, specifically in network security domain. Observed that, k-NN and Naïve Bayes are majorly focused [8,9,10]. So they are selected. From implementation viewpoint, more details on SVM, k-NN and Naïve Bayes are obtained from [11, 12].
2.3 Choosing a Dataset
The Third International Knowledge Discovery and Data Mining Tools Competition were organized in association with The Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99). The task for competition was to develop a network intrusion detector. It should possess prediction ability to categorize the connections into “good” (normal or legitimate) or “bad” (intrusions or attacks). The data set KDD-99 (also referred as KDDCup99) contains various intrusions which were simulated in military network environment [13]. KDD-99 is considered as a standard dataset for testing of classifiers for NIDS.
2.4 Feature Selection Methods
Feature selection is the process of selecting “interesting” features from the dataset [15]. The fast clustering-based feature selection algorithm (FAST) [2] is a latest technique devised by Song et al. (2013). This algorithm is based on MST (minimum spanning tree). As mentioned in [2, it is tested on 35 publicly available datasets and has shown encouraging results as compared to many other techniques viz. CFS, FCBF, Relief, Consist and FOCUS-SF.
As mentioned in [2], the Fast clusteringbAsed feature Selection algoriThm (FAST) is using MST (minimum spanning tree) concept. It works in three steps- (i) remove irrelevant features as they will not be used in further steps. (ii) construct MST from relevant features. (iii) cut down the MST to form a cluster of relevant features and then the most relevant (representative) features can be selected from each cluster. Thus, only a small number of important features are presented as output.
3 Implementation Details
The new system is an extension to the empirical model from NIDS perspective. It covers (i) use of multiple classifiers, (ii) introducing feature selection method, (iii) facility to compare classifiers result, and (iv) prediction and learning mechanism.
3.1 Architecture
Building blocks of new model are explained below-
-
Preprocessing: The module accepts input data file (KDDCup99 10%) and applies basic preprocessing rules such as noise and duplicate removal. Then the training set generation algorithm [1] is executed to generate a dataset consisting n samples. Same algorithm is used for creation of testing dataset as well. To serve the practical purpose of the model, it provides save and load facility to the user.
-
Multiple Classifiers: New model supports multiple pattern classifiers to be tested on tested on the given dataset. SVM, k-NN and Naïve Bayes are implemented here.
-
Feature Selection: The FAST algorithm [2] is implemented to reduce the dimensionality of dataset. This reduced dataset is provided as input to multiple classifiers. KDDCup99, the input dataset, contains 41 features. However all 41 features need not be so important in deciding the category of packet. Also, whenever any network packet is to be examined, all 41 features will be checked against the trained dataset. In order to reduce the processing time, it is substantial to identify important features and carry out classification on that reduced dataset. But, at the same time, the accuracy of classifier should not be compromised.
-
Result Presentation: The performance of multiple classifiers can be evaluated here when using full dataset as well as reduced dataset. Evaluation parameters are- Accuracy percentage and Time taken for processing. Accuracy means the percentage of records correctly classified. Processing time is difference between start-time and end-time of algorithm execution as captured in the logs and it is measured in milliseconds. The result is presented in tabular and graphical format Fig. 1.
-
Learning module: This module facilitates prediction of records classification. If new, unseen record is not found in training dataset, then it uses probability-based prediction [7] technique and predicts its class. In case, the obtained result is wrong, the administrator can rectify it and such history is explicitly maintained as supplementary to training dataset. The workflow is shown in Fig. 2.
3.2 Mathematical Model
Since the proposed model comprised of interconnected module and each module consists of number of processes and has definite input and output, the mathematical model is represented in the form of set theory.
Let S is universal set of processes involved in the new system. S is represented as: S = {P,F,C,L,R}
-
Preprocessing: Let P is set of processes required for preprocessing activity.
P = {Pr,Pn,Pd,Ptr} where,
Pr is process for reading input file
Pn and Pd are processes for removal of noise and duplicates respectively
Ptr is process for generating training set
-
Feature Selection: Let F is set of processes for feature selection implementation.
F = {Pr,Fc,Fr} where,
Fc is calculating Symmetric Uncertainty
Fr is generating set of significant features
-
Classification: Let C is set of classifiers, such as one-class SVM, k-NN and Naïve Bayes. Thus, C = {Csvm,Cknn,Cnb,Cr,Ca} where,
Ca is process of calculating accuracy of classifier result Cr
-
Learning: Let L is the set of processes required for prediction and learning module.
L = {Pr,Lp,Lh} where,
Lp is probability-based class prediction
Lh saves rectified result for future use
-
Result: Let R is the set of processes required for management of result generation which includes definition of metrics, data collection and graphical presentation.
4 Results and Discussion
4.1 Testing Approaches and Results
The testing approach mentioned in [1] is followed here. It suggests varying the number of legitimate and malicious samples in training dataset. All classifiers are processed for the given dataset. Additionally, the feature selection algorithm is executed and all classifiers are processed for the reduced dataset as well Table 1.
Analysis of classifiers based on processing time:
Graph shows the comparison of classifier according to their average processing time for each sample, across different size of training sets. As the size of training set increases, processing time of SVM is also increasing. As SVM took too much time (2020s i.e. 33 min) for 30000 records, SVM is not tested on further larger training sets. KNN performs moderately whereas Naive Bayes is indeed a winner here. Also observed that, though SVM took 33 min for 30000 records set, the system completed the task successfully; the code did not crash. This indicates the robustness of the system.
Analysis of classifiers based on accuracy:
This graph shows the comparison of classifier according to their accuracy across different size of training sets. Starting from the training set of 5000 records, Naive Bayes classifier had low accuracy. However, it improves significantly when tested on larger size training sets. KNN performs moderately whereas SVM wins the race as it consistently gives almost 99% Figs. 3 and 4.
Analysis of average processing time of classifiers on machines with different RAM size:
Additionally, the classifier processing time is recorded for a fix-sized training set of 10000 records (5000 normal and 5000 malicious samples), on two more machines. These three machines have identical configuration, except RAM size.
4.2 Comparative Analysis
The proposed model is an extension to empirical evaluation model and the authors have tried to cover few limitations in this enhancement. For comparative analysis, the key differences between two models are presented below Fig. 5 and Table 2:
5 Conclusion
Considering the reviewed literature, the authors proposed enhancement to the empirical model to mold it for NIDS. The steps for dataset construction and SVM are continued as mentioned in empirical model and also the SVM. The extension to the model has been done in 3 stages- (i) introducing feature selection method on the training dataset, (ii) incorporating multiple classifiers in the model, and (iii) introducing the prediction mechanism with facility to rectify a wrong prediction and corrected result can be used for future use. This is advantageous for practical use of the model.
The test results obtained so far show that SVM is consistent in achieving highest accuracy whereas Naïve Bayes is superior in processing time. K-NN performs moderately. Also, Feature selection improves the performance of classifier by reducing the processing time but impacts their accuracy.
References
Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng. 26(4), 984–996 (2014)
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Hoboken (2000)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning, pp. 121–129 (1994)
Biggio, B., Fumera, G., Roli, F.: Multiple classifier systems for robust classifier design in adversarial environments. Int. J. Mach. Learn. Cybern. 1(1), 27–41 (2010)
Barreno, M., Nelson, B., Sears, R., Joseph, A.D., Tygar, J.D.: Can machine learning be secure? In: Proceedings of the ACM Symposium on Information, Computer and Communication Security (ASIACCS), March 2006
Thornton C.: Machine Learning Lecture 4: Naive Bayes Classifier. http://users.sussex.ac.uk/christ/crs/ml/lec02b.html
Safa, H., Hajj, W.E., Salem, F.K.A., Moutaweh, M.: Using K-nearest neighbor algorithm to reduce false negatives in P2P secure routing protocols. IEEE (2015)
Taruna, S., Hiranwal, S.: Enhanced Naive Bayes algorithm for intrusion detection in data mining. Int. J. Comput. Sci. Inf. Technol. 4(6), 960–962 (2013)
Altwaijry, H., Algarny, S.: Bayesian based intrusion detection system. J. King Saud Univ. – Comput. Inf. Sci. 24, 1–6 (2012). Elsevier
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier, Amsterdam (2006)
Naive Bayes Classifier. https://en.wikipedia.org/wiki/Naive_Bayes_classifier
KDDCup99 dataset. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Kang, D., Fuller, D., Honavar, V.: Learning classifiers for misuse and anomaly detection using a bag of system calls representation. In: Proceedings of the 2005 IEEE Workshop on Information Assurance and Security, United States Military Academy, NY, USA (2005)
Gaikwad N., Sangve, S.: Enhancement and evaluation of pattern classifiers in NIDS: a survey. Int. J. Comput. Sci. Inf. Technol. 6(6), 5435–5438 (2015)
Acknowledgements
The authors would like to thank the publishers and researchers for making their re-sources available. We thank the teachers for their guidance and suggestions. We are thankful to the college authority for providing necessary infrastructure and support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gaikwad, N., Sangve, S. (2016). A Novel Model for NIDS with Evaluation of Pattern Classifiers and Facility of Rectification. In: Unal, A., Nayak, M., Mishra, D.K., Singh, D., Joshi, A. (eds) Smart Trends in Information Technology and Computer Communications. SmartCom 2016. Communications in Computer and Information Science, vol 628. Springer, Singapore. https://doi.org/10.1007/978-981-10-3433-6_71
Download citation
DOI: https://doi.org/10.1007/978-981-10-3433-6_71
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3432-9
Online ISBN: 978-981-10-3433-6
eBook Packages: Computer ScienceComputer Science (R0)