Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

As many organizations are facing the cyber-attack, confidentiality, integrity, and availability of the data is become a major issue. Intrusion detection system is designed to detect intrusion in a single host or in a network [1]. Formal type is called host based intrusion detection system while the later one is the second type of intrusion detection system and is called, network based intrusion detection system. Host based intrusion detection system use system log files and other logging mechanism to identify any attack. It resides on a single host system and that is why it highly depends on operating system architecture. Any shortcoming of operating system may compromise intrusion detection system as well. On the other hand, network based intrusion detection system is deployed on a network segment and it analyzes the network packet for any attack [2]. Network based intrusion detection system is independent of the operating system, in fact it is transparent to the operating system of the host as it do not reside on the host system. Both types of intrusion detection system use intrusion detection method for the detection of intrusion.

There are two types of intrusion detection method. One is called signature based intrusion detection method and the second type is called anomaly based intrusion detection method [3]. Signature based intrusion detection method also known as misuse based detection method looks for pattern or signature in a data that complies within a malware [4]. Signature based IDS has a database consists of signatures of the attacks. Signature based IDS uses the stored signatures of the malware for the detection. Therefore, this method has high true positive rate. The problem with the signature based technique is that it cannot detect novel attacks as no signature exist yet for the novel attack [5]. Contrarily anomaly based detection method can detect novel attack as it looks for abnormal behavior that do not comply with the normal operation of the system or network. Major drawback of this method is that it has high false positive rate since it is difficult to define normal behavior of the system or network [6].

Many machine learning algorithms have been used for the implementation of the detection methods. These machine learning algorithms highly depend on the input features. Irrelevant, redundant, and noisy features causes the machine learning algorithm to develop the detection model with low accuracy rate and with high false positive rate. Therefore, these feature must be eliminated at the preprocessing step.

Rest of the report is organized as follows; Sect. 2 gives introduction about feature selection. Section 3 contains introduction of ant colony optimization (ACO) which is followed by related work in Sect. 4. Proposed methodology has been discussed in Sect. 5 and results are given and discussed in Sect. 6. At the end conclusion is given in Sect. 7.

2 Feature Selection

For classification problem feature selection is used for the elimination of irrelevant, redundant, and noisy features to improve the accuracy of the classification algorithm. This is done at the preprocessing step before applying any machine learning algorithm. Feature selection process selects a subset of features that represents the whole feature set [7]. Which features should be included or excluded is being decided in this process. Relevancy and redundancy are the two decisive factors for feature selection process [8]. Relevant features are those that envisage the desired system response, on the other hand, redundant features have a high degree of correlation among themselves. Thus, removal of the redundant features is desired. Features that are highly correlated with each other give no additional information. While robust features have a high degree of correlation relevant to desired decision and uncorrelated with other features in feature set. By using feature selection at preprocessing step, the predictive accuracy of the machine learning algorithms can be increased. Robust feature set also reduces the training time of the classifier as robust features are invariant in nature and reduces the dimension in high dimensional data. Reduced dataset also decreases dataset which acquire less storage space.

Feature selection process has four steps as shown in Fig. 1. Subset generation, subset evaluation, stopping criteria, and result validation [9]. Subset generation generates a different subset of features, and each feature subset is evaluated in subset evaluation process. If the current feature subset is better than previous feature subset than it replaces the previous one. Subset generation is a searching process, which can be complete, heuristic or random search. Generated feature subset is then validated by some tests.

Fig. 1
figure 1

Feature selection process

In network intrusion detection, features are extracted from protocols header at different layers of network architecture and contents of data packets. Due to this reason noise in channels propagate to extracted features, this leads to false intrusion alarm. There are two types of feature selection methods: Filter and wrapper. Filter method selects the subset of features without involving learning algorithm in evaluation phase and is mainly based on ranking of features, which represents the relevancy of the features [10]. In contrast, wrapper method evaluates a subset of features using learning algorithm [11]. This evaluating algorithm is called iteratively unless a robust subset of the feature is selected. Filter based approach is computationally fast compared to wrapper based as it doesn’t involve any learning algorithm during ranking of features. However, wrapper based feature subset accomplishes good accuracy rate as it involves learning algorithm in the subset evaluation phase [12].

3 Ant Colony Optimization

Ants used chemical substance for the communication and is called, pheromone. Ants used it to remember the path from source of food to nest. More intensity of pheromone attracts more ants. Initially ant’s foraging like behavior was used for traveling salesman problem (TSP) and the model was named ant system [13]. Ant colony optimization (ACO) has produced efficient results for NP-hard set problems. ACO has many variations. In this paper we used ant colony system (ACS), which uses two level pheromone update i.e. local pheromone and global pheromone update. In global pheromone update only those edges get pheromone update that belongs to best ant. A digital ant selects next node using some transition probability rule.

4 Related Work

Since feature selection is NP-complete problem that is why ACO has been widely adapted for feature selection. Below lists some of them.

George [14] utilized principle component analysis for feature reduction. Using principle component analysis 28 features were selected. The reduced feature set was validated using SVM. Tsang et al. [15] used independent component analysis and principle component analysis for his proposed model called, ant colony clustering model.

Gao et al. [16] proposed ant colony optimization method for KDD99 feature selection. The proposed method mapped the features into graph which were connected to each other, giving opportunity to each ant to select any feature. Selection of next feature by an ant was based on the heuristic information and pheromone value. Fisher discrimination rate was used as heuristic information. Edges contained the pheromone value and only the ant that resulted less squared error used global pheromone update on the edges visited during solution construction.

Nadia and Marcus [17] proposed a wrapper based feature selection based on ACO. The proposed method does not used traditional graph method, instead each feature is represented by 1 and 0 which indicated the selection of feature. An ant select next feature using a probability function which uses pheromone value and heuristic information. Each feature possessed pheromone value. While heuristic information described the desirability of feature which calculated by the number of ants visited that feature. Local pheromone update was used at each construction step while global pheromone was used by the best ant which update the pheromone value for all features that were selected by best ant.

Alwan and Mahamud [18] used mixed variable ant colony optimization for feature selection and at the mean time regulating C and γ parameters for SVM. SVM used RBF kernel function and the applied kernel function highly depends on C and γ value. During the feature selection mixed variable ant colony optimization method also searches for C and γ values that can improve the accuracy of SVM. Heuristic information adopted fisher discrimination rate. Pheromone value lied on the edges and only the ant that produced high accuracy for SVM was allowed to update the pheromone value on the edges used during solution construction.

5 Proposed Methodology

Ant colony optimization for feature selection has been proposed in this paper. Features are represented in a completely connected graph problem thus choice of selecting next feature is given to each ant. An ant moves to next feature using given transition probability.

$$p_{i,j} = { \hbox{max} }(\tau_{\text{j}} )^{\beta } .\upeta_{\text{j}}$$
(1)

Pheromone value (τ) is related to each feature instead on edges. At the start of solution pheromone value to each feature is initialized by its entropy value to the prediction of the class. β controls the importance of the pheromone value during selection of next feature. If β is 0 than pheromone value for the feature is completely ignored. Number of time the feature visited is considered as heuristic information (η). Initially heuristic information is kept 1 so that no feature can get biased heuristic at the start of the constructing solution by the ants.

At each solution construction, local pheromone value is updated using given formula,

$$\tau_{j} = (1 - \rho ).\tau_{\text{j}} + \vartriangle_{\text{j}} \cdot \sigma$$
(2)

where

$$\Delta _{\text{j}} = \left\{ {\begin{array}{*{20}l} \rho \hfill & {if\,j\,\upvarepsilon\,S^{ + } } \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right.$$

S+ is the set of features visited for that particular run. σ ϵ (0,1] controls the value for Δϳ. After completing tour each ant passes its dataset to naïve bayes classifier. Ant’s dataset that results high accuracy rate gets global pheromone update and uses following equation.

$$\tau_{j} = (1 - \rho ).\tau_{\text{j}} + \emptyset_{\text{j}} \cdot \sigma$$
(3)

Where

$$\emptyset_{\text{j}} = \left\{ {\begin{array}{*{20}l} \rho \hfill & {if\,j\,\upvarepsilon\,global\,best\,ant\,tour} \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right.$$

This whole method is repeated until stop criteria is met which is if none of the ants can improve accuracy for naive bayes classifier compare to the previous best ant result.

6 Results and Discusssions

In this paper KDD99 dataset has been used for evaluation of detection model. Feature subset is validated using LibSVM in Weka [19]. Binary classification has been used in the experiment. This dataset is widely adapted for the evaluation of detection model [20]. Dataset contains one normal class data and four attack classes’ data namely, Denial of service (DoS), Probe, Remote-to-Local (R2L), and User-to-Root (U2R). Training and testing dataset exist for this dataset. Training dataset hold 494,021 network records while testing dataset comprises of 311,029 network records. Each instance is represented by 41 features. Both datasets contains redundant instances which were removed. Subset of training dataset is generated which contains 5823 instances for each two classes, normal class and attack class. Since we used binary SVM therefore all the four attack classes are merged into single attack class. The test dataset used contains 77,287 instances.

By using ACO 14 features were selected. Feature subset is than validated using SVM. Result is compared with full feature set result. Figure 1 shows the result for normal class. It can be seen that true positive rate (TPR) for reduced feature set is improve to 98.5 % from 98.2 % for whole feature set. Moreover, false positive rate (FPR) for reduced feature set is 2 % compared to full feature set i.e. 3 %. Figure 2 depicts the result for the attack class. From result it can be seen that reduced feature set gave TPR 98 % which is better than full feature set results 97 %. Accuracy for both feature set is shown in Fig. 3. Accuracy rate for SVM has been improved from 97.72 to 98.29 % when classified with the reduced feature set. (See Fig. 4)

Fig. 2
figure 2

Result comparison for normal class

Fig. 3
figure 3

Result comparison for attack class

Fig. 4
figure 4

Accuracy comparison for feature set

7 Conclusion

High amount of data and irrelevant, redundant features make it difficult to build the prediction model for anomaly detection method. Features selection play vital role to build the prediction model in machine learning. In this paper feature selection method for anomaly detection has been presented. Ant colony optimization (ACO) has been proposed in the work due to its capability of utilizing previous information in the form of pheromones. SVM is used to build the anomaly detection model and the selected features are validated using this model. This work shows that robust features can be selected using ACO. Also fast and efficient detection method can be achieved using these robust features. This leads to real time detection of the intrusions in networks.