Keywords

1 Introduction

Intrusion is an action executed to break the security of one’s system and to misuse. Mainly there are two threats in any system first is malware and other is intruder. Intruder may be defined as a threat which always used to break the system and to mislead the system [1]. For the solution of intruder many researchers used to introduce a detection system termed as intrusion detection system. This paper gives a spotlight on the intrusion detection system and its working and how one can enhance its working by reducing false positive alarms. Researcher provides many techniques like firewall, encryption [2] to protect the interior system from intruder or intrusion but because of its some drop out intruder make their way to affect the system without any harm to itself [2].

Now a day it is very difficult to make any system free from intrusion. The main function of IDS is to detect the unknown or abnormal activity in a particular system and to resolve this activity in very less interval of time [3]. IDS are used to protect or prevent various penetration and inner structure of computers. This system consist of several hardware and software which is used to determine unexpected events which is going to give an indication like attack is going to happen, attack is happening in your system or attack is happened in your system. These are such indication given by the IDS. It can be classified as its working type of the system like it warns before attack or it can warn while attack is in process or it warns after attack [3]. There are three components of IDS (a) sensor which is used to generate events and to sense the traffic of network or activity of the system (b) console which is used to control sensor, events and alerts and (c) detection engine which is used for the generation of alerts after receiving variation from security events. It is also used to maintain the data of sensors’ events in any database.

This paper gives its contribution towards the enhancement of IDS system and its working on intrusions. KDD-99 is used in this paper as a data set having 41 features for analysis. The main goal of this paper is to attain accuracy in the working of IDS by mitigating the false positive alarms. This work represents the various processes processed for integrating IDS. Many machine learning approaches whether it is supervised or unsupervised techniques are utilized to enhance the efficiency of intrusion detection system which is explained clearly in Sect. 2. Organization of rest paper is as follows; Sect. 2 with related work, Sect. 3 gives the description about the proposed work, analysis and evaluation is explained in Sect. 4 and at the end summary of whole work is given in Sect. 5.

2 Related Work

Machine learning is a study that permits the computer or any system to learn without being programmed. There are two types of approaches which are used to enhance the working of IDS system. In supervised approach, they are capable to create the function from the given training data. There are several techniques given by several researchers. To apply any technique or approach it is important to have a study on the IDS system, its function appropriately so, [4] gives a brief study on the function of IDS which mainly focuses on four major classes of detection methods (a) classification (b) statistical (c) information theory and (d) clustering. Ahmed et al. [4] Also used to spotlight on the problems ascertain during its function. To develop the efficiency in the function of IDS [5] gives a fuzziness based semi-supervised learning approach which is termed as SSL. SSL is used to improve the performance of classifier for IDS where unlabeled samples are assisted with some other supervised learning approach. To improve the security of IDS [6] provides a deep neural network which is used to alert the system about attack and then sensor used to recognize the malicious attack. There is some other supervised approach to develop IDS like k-NN, fuzzy K-NN etc.

Ambusaidi et al. [7] Proposes a combination of IDS with LSSVM with dataset KD-99 which is also used in this paper for enhancing the intrusion detection system. Now if we talk about the un-supervised learning approach, it is a method of machine learning where a model is perfect for observations. There are number of approach like hybrid approach, domain approaches etc. Erin Liong et al. [8] Provides an un-supervised approach for learning purpose termed as DH approach which stands for deep hashing approach. Patel and Jhaveri [9] gives a study on the several machine learning techniques like ANN, SVM, Q-learning and Bayesian network to recognize the malware nodes and also recognize the misbehaving of nodes in the system. A xeromorphic cognitive approach is given by Alom and Taha [10] to detect intrusion in cyber security with the help of deep learning. These are some techniques given by researcher to enhance the IDS and make it efficiently.

3 Proposed Work

As discussed in Sect. 1 this paper works on KDD-99 data set. In this section, KDD-99 dataset with 41 features is processed for optimization, and learning to attain accuracy, precision and recall. Whole methodology is proposed into three phase, in first phase implementation of KDD-99 data set, in second phase optimization is processed on the feature and at last they allow for learning in third phase. KDD-99 dataset is a bench mark dataset and recognize by many users. For testing we use to select 10% of KDD-99 which contains 41 features from KDD-91. Where 10% KDD-dataset contains 494,021 connections and we set 311,030 connections for our work. These connections are labeled as normal or attack which classified into four categories: Denial of service which is termed as (DOS), Probe (port scanning) Unauthorized access to root person termed as U2R and last unauthorized remote login to machine [11]. There are three kind of features (1) basic feature (2) content based feature and (3) time based feature [11]. These features are labeled like X1, X2 … Xn. After labeling these features are used for optimization in our work.

Optimization is done by two algorithm PSO and learning approach. Initially feature is applied on PSO for optimization to obtain fitness value. If any feature is not optimized by PSO then these un-optimized feature is allowed on learning algorithm for further optimization process. If all the features are optimized by these two algorithms, then it further moves to check convergence. If convergence is not accepted, then these feature again used for optimization by PSO and Learning algorithm. If convergence check is ok, then we move to next phase which is learning.

After labeling the dataset feature optimization is done by PSO which stands for particle swarm optimization where swarm denotes to collection of particles. In the process, PSO Particles floats through the hyper-dimensional search space. PSO is a population based search algorithm which is based on simulation on the social behavior of birds within a flock. Variation in the position of particle in a search space is depend upon the psychological tendency of each particle to imitate the development of other. In PSO swarm consists of a set of particles where each particle demonstrates a potential solution [12]. The position of particle is varying with respect to its own experience and of the neighbor particles. PSO is used to optimize the value of objective function. Every particle in space used to mobile to find the point where optimized function is obtained where ‘z’ is the position of particles in time ‘h’ having velocity ‘u’. Every particle has its local and global best position in the space. Global best position is the position of a particle which is close to optimal value and all the particles move towards the global best position. The global position of particle will vary with the motion of particles. The changed position is obtained using equation [12].

$$z_{r} \left( h \right) = z_{r} \left( {h - 1} \right) + u_{r} \left( {\text{h}} \right)$$
(1)
$$u_{r} \left( h \right) = Iu_{r} \left( {h - 1} \right) + L_{1} V_{1} \left( {z_{{p{\text{best}}_{r} }} - z_{r} \left( h \right)} \right) + L_{2} V_{2} \left( {z_{{g{\text{best}}}} - z_{r} \left( h \right)} \right)$$
(2)

where, ‘I’ is the weight of inertia and L1 and L2 are the learning factors and V1 and V2 are the random values. Using Eqs. 1 and 2 we can obtain fitness value of the feature after optimization. If all the features are optimized, then we proceed to further step that is convergence check otherwise we use genetic algorithm to optimize the feature which are left by PSO algorithm. In this work genetic algorithm is utilizes to obtain the optimal and near-optimal threshold for feature selection. Genetic algorithm is a technique which is used to evaluate true and approximate solution to optimization and search problem. In this paper GA is used for find the solution of optimization.

In GA, the main and initial step is to demonstrate the problem in such a manner that GA able to resolve the problem as it works on binary coding. In GA, chromosomes are used to gradually evolve with the help of biological operations. It might be possible to obtain greater feature from big data but it takes extra time and computational steps hence, we used GA for selected feature for random generation [13]. After initializing, every individual chromosome are computed by fitness function and according to this value of chromosome which is associated with fitness gives better result as compared to un-fit value. Crossover permits the search to determine the efficient way to obtain solution and optimization, and also allow chromosomal material from different parent to be combined in a single child. Crossover gives a way to introduce new information into the population [13, 14]. Here, GA uses Roulette selection for selecting best features from the feature sets. Roulette selection is similar to the game of roulette, every features get a slice of wheel but the features which is more fit gets larger slice as compare to other. In short, they are chosen in terms of their fitness value:

$$p_{s} = \frac{{S_{i} }}{{\sum\nolimits_{k} {S_{k} } }}$$
(3)

After optimization with Genetic algorithm if, all the features are optimized they proceed to further step otherwise the optimization with GA is repeated. This procedure is repeated unless the entire features are optimized. After optimization every feature convergence is tally if it is ok then our feature are allow for learning by neural networks [13, 14].

figure a

Now, optimized feature are permitted to learn from neural networks which is our phase3. From the reference of bio-logical neurons, there is an introduction of artificial neural network which is a set of neurons similar to the neuron in human nervous systemin computer system. These neurons are used to learn patterns and relationship in the given data. In this work neuron network are used to learn the feature which is optimized from PSO and GA. NN do not require any explicit codes of any particular problems this is because of the learning rules in NN. These rules helps network to gain knowledge from the given or available dataset and implement this knowledge in problems according to requirement. NN is a network which contains number of nodes having node function related to it that is used to evaluate the output from the local parameters. This node function depends upon the variation in local parameters. Hence it is termed as information processing system where neurons are used to process the information and signals are transmitted through connection links. These links have some weight which is multiplied with respect to the incoming signals. Neural net may be of single layer or multiple layers [15] (Fig. 1).

Fig. 1
figure 1

Labeling of KDD-99 data set

Optimized features are converted into neurons and applied as input like N1 and N2 having weight M1 and M2. It may be single layer or multi-layer of neurons. In the input layer raw information i.e. optimized features are input in the networks. In hidden layer evaluation is done between input data and hidden layer in terms of their connecting weights. On the basis of activity of neurons in hidden layer output varies (Fig. 2).

Fig. 2
figure 2

Optimization process

Net weight is calculated as:

$${\text{Net}} = {\text{N}}_{{\text{1}}} {\text{M}}_{{{\text{1}}{\kern 1pt} }} + {\text{N}}_{{\text{2}}} {\text{M}}_{{\text{2}}}$$
(4)

and can be written as

$${\text{Net}}\,{\text{input}} = \sum\nolimits_{m} {N_{m} M_{m} }$$
(5)

4 Description of Dataset

As discussed above experiments are executed using KDD-99 having 41 feature sets. These features are used for optimization and then learning and now they are use to analyze in terms of attack. In this work, we use to evaluate the accuracy rate in an intrusion detection system. In the analysis we take data on the basis of number of intrusions. Attacks generally fall into four categories (1) Dos, (2) Probe, (3) R2L (4) U2R. In our analysis we uses three categories (1) other attack which consist of probe, R2L and U2R (2) DoS-attack, and (3) Normal attacks (non-attacks). In this work we evaluate the accuracy, precision, recall, and F-measure in various cases:

Case 1: Evaluation of accuracy, precision, f-measure and recall is given by ANN individually, ANN with GA, ANN with PSO and ANN with combined GA_PSO which is represented in Table 1. In this case we evaluate the efficiency of IDS by applying ANN individually or with GA and PSO or by hybrid of both algorithms with ANN.

Table 1 Evaluation of the parameters among ANN, ANN with GA, ANN with PSO and ANN with GA_PSO
Table 2 The attack type from KDD CUP 99 dataset

Case 2: In this case, we evaluate the efficiency for single ANN on three attack condition (a) other attack consist of probe, R2L and U2R, (b) Dos attack, and (c) Normal or non-attacks condition. Similarly we evaluate the efficiency for ANN with PSO, ANN with GA and at last ANN with both GA_PSO. Here we evaluation efficiency of algorithms in terms of accuracy, precision, recall and F-measure. If we spot some light on attacks we are considered. Dos attack which stands for denial of service attack in Dos attack hidden attacks is done by user which is shown in the system. This type of attack may be done by single intruder or a group of intruders. It makes the system unavailable to its real user. Probe attack is a kind of attack where intruder used to break the security by trial methods. R2L attack stands for remote to user attack. And at last U2R attacks it is the type of attack where intruder starts on the system as a normal user and spoil all the activities of the systems [16] (Fig. 3).

Fig. 3
figure 3

Layer of MNN

4.1 Experimental Results

In this section we analyze the statistical data by simulation. Graphical result of Tables 1 and 3 is given by simulation process (Fig. 4; Table 2).

Table 3 The static data to analyze the efficiency of the approaches for above discussed attack
Fig. 4
figure 4

Various layers in neural networks

Figure 5 shows the simulated analysis of Table 1 in terms of accuracy, precision, recall, and F-measure. In this figure analysis on efficiency is demonstrated from all the four algorithm that are ANN represented by green line, ANN with PSO represented by purple line, ANN with GA represented by red line and ANN with both PSO and GA represented by blue line. Analysis demonstrates that ANN with both SPO and GA gives better result in terms of all the four parameters (accuracy, precision, recall, and F-measure).

Fig. 5
figure 5

Simulated graph of Table 1

Observation 1: In Fig. 5 Parameters analysis of different classifier and proposed approach has been shown. In analysis, parameters like precision, recall, accuracy and F-measure vary according to classifier but one analysis about proposed approach (PSO with GA in neural network) is very clear that it  shows significant improvement in all parameters. If analysis have to be made only over proposed approach then recall parameter have shown significant improvement then other parameters. So it gives clear indication of reducing false negative rate and attacks identification is effective in proposed approach because of optimize weight given by PSO_GA approach.

Figures 6 and 7 gives the analytic result in terms of accuracy (represented by black line), recall (represented by red line), precision (represented by blue line), and F-measure (represented by green line) of two approaches that are ANN and ANN with GA for the parameters (Dos attack, other attack, and Non or normal attack).The entire four graphs demonstrate the better efficiency of the algorithm ANN with PSO_GA.

Fig. 6
figure 6

Analysis with ANN and ANN _GA

Fig. 7
figure 7

Analysis with ANN _PSO and ANN with PSO_GA

Observation 2: In Fig. 6, depth analysis of all three classes in ANN and ANN_GA has been shown. In this analysis, we have tried to show what is the significance of our approach. We continue this discussion in observation 3 also. So, at first point if analysis is done over normal class and on other class which not any attack in both cases ANN and ANN with GA performed well compared to other parameters like precision, recall, and F-measure but ANN_GA still has better accuracy than ANN, so feature weighted by optimization somehow performs well because of reducing overlapping information learning. If analysis is done through DOS attack only it also shows higher accuracy in ANN with GA, so we can conclude Feature optimize weight is better approach. So how it is going to improve optimization that have been discussed in next part.

At last from the whole analysis it can be concluded that algorithm ANN with PSO_GA gives better result for all the attacks we examined in our work.

Observation 3: In Fig. 7, analysis continues from observation 2 and tried to find out the significance of optimization and improvement in detection of different classes by classification. If analysis have to be made then both graph shows the effective recall but only for normal class, which reduces the false positive rate. This improvement can be seen with all classes like DOS attack and other attacks. But effective results have been seen in the in proposed approach where the improved detection have been done for other attacks too. So PSO optimization is effective but PSO with GA i.e. the proposed approach gives more improved results in other attack and normal class.

5 Conclusion

This paper investigates the optimization weight of feature and how to reduce the statistically overlapping between features and improving the optimization by hybridization is worthy or not. In the proposed approach optimization base features with artificial neural network has been used and experiment shows that this hybrid approach not only improves cumulative accuracy, precision, recall, and F-score but also improves all individual parameters among three classes (i.e. DOS attack, Normal and other attacks).