Keywords

1 Introduction

Due to the excessive use of computer networks, data’s safety and security problem is increasing day by day. Attacks are sometimes called intrusion to cause much damage to the system and our data. IDS helps us in this work and keeps our system free from attacks. Earlier, the methods used were encryption, firewalls, virtual private networks, etc. Now-a-days these cannot be trusted entirely. These techniques do not suffice to protect our data, that is why today there is a need for such a technology that can investigate and analyze the wrong and illegal activities occurring in our systems and networks.

Hence a dynamic approach has been designed for the security of our network which is called an intrusion detection system (IDS). It is of two types: Signature-based IDS and anomaly-based IDS [1]. It is further categorized as host-based and network based depending on the situation of the environment. Host-based invigilate the conduct of the host system and network-based invigilate the behavior of the network.

Lee et al. [1] in 1999 constructed the classification model with 41 attributes from the unprocessed traffic that was collected at MIT Lincoln Laboratory [2] called KDD Cup 1999 (The 1999 Knowledge Discovery and Data Mining Tool).

2 Data Mining in IDS

Data mining is a broad concept that is most commonly used in computer science. It is the process of extracting out new valid, meaningful and significant information from the large database. In the last few years, data mining techniques like classification, clustering and association rules have successfully found intruders. Business organizations and commercial accountants mainly use it, but now it is increasingly used in research to extract valuable information during experiments and observations [3]. Because of the excessive use of computer networks, there is a large volume of existing and newly arrived data on the network that need to be processed. That is why data mining-based IDS has gained attention in research. Data mining-based idea is used to examine the covered pattern of intrusion and the relationship hidden in the data [4]. It is used for the detection of variants, control false alarm and improve efficiency [5].

3 WEKA Tool

This tool is used to perform data mining and machine learning tasks. It is a group of many machine learning and data mining algorithms, especially classification, data preprocessing, regression, feature selection and visualization. Its programming is done in the JAVA language. It is used extensively in research. It has 49 data preprocessing tools, in addition to this has 76 classification algorithms, 15 attribute evaluators and 10 search algorithms for the selection of features. All the files must be in ARFF format to be run in it.

4 KDDCup 1999

KDDCup99 is the oldest and the most commonly used dataset which is used to access anomaly detection. It identifies good connections called normal and bad connections called intruders. Nine weeks of data is collected for training and test data. It contains 490,000 instances and every instance has 41 features labeled as either attack or normal [6]. These features are categorized into three groups as basic, traffic and content. There are 24 training attack types which are grouped as

  • Denial of Service: In this attack, the attacker send so much request on the server and make the memory and other resources too busy that the request of genuine user to access the machine is denied, e.g., smurf attack, land attack, etc.

  • R2L: In this type of attack, the attacker finds a way to get access to the machine through negotiating the security-like guessing the password, etc.

  • U2R: In this attack, the attacker has the benefit of local access to the system and makes an effort to access to the administration, e.g., buffer overflow attack.

  • Probe: In this attack, the attacker benefits by collecting the information about the victim machine, he gains this information by taking advantage of their weaknesses, e.g., Port scanning.

5 Related Work

The most important project in network security is to make an effective NIDS. Experts have done a lot of work in this field. The work is divided into 3 parts. First, find the important and relevant features from the attribute set, then upgrade learning algorithms and then assess the performance on any dataset. Li et al. [7] gave an ideal IDS in which they sorted the required features by applying the information gain method and chi-square method. The outcome of this work indicates that the accuracy of the detection is still maintained even by using some selected attributes. In [8], the author used PCA to select features and the features which had higher Eigenvalues were retained. In [9], the author explains that by using adequate training parameters and selecting the right features, high accuracy can be achieved. The performance of an IDS can also be improved. In [10] writers used function rating set of rules to lessen the function area through the usage of three rating set of rules primarily based totally on support vector machine (SVM), multivariate adaptive regression splines (MARS) and linear genetic programs (LGP). In [11], creators propose “Enhanced Support Vector Decision Function” for function selection that is primarily based totally on essential factors. First, the function’s rank, and second, the correlation among the features were adopted. In [12], writers advise an automated function choice process primarily based totally on correlation—primarily based feature selection (CFS).

6 Algorithms Applied

6.1 J48

J48 is an open-source algorithm in WEKA used to build the decision tree and made known by Ross Quinlan. This algorithm works on supervised learning. The decision trees produced are used for classification. During decision tree construction, first of all find out the instances belonging to the same class. If found then, the tree is represented by a leaf and labeled by same class. Second, the information gain calculates every attribute and then results in the best attributes for branching the tree [13,14,15].

Entropy is the term used to calculate information gain [16]. The formula of entropy (E) is

$$E = - \mathop \sum \limits_{i = 1}^{n} P_{i} \log_{2} P_{i}$$
(1)

Gain (S, X) of attribute X w.r.t. total sample (S) is

$${\text{Gain}}(S,x) = E(S) - \mathop \sum \limits_{{j \in {\text{values}}\left( X \right)}} \frac{{\bmod \left( {Sj} \right)}}{\bmod \left( S \right)}E(S_{j} )$$
(2)

Information Gain can be calculated as

$${\text{Splitinfo}}(S,X) = - \mathop \sum \limits_{k = 1}^{c} \frac{{\bmod \left( {Sk} \right)}}{\bmod \left( S \right)}\log_{2} \frac{\bmod (Sk)}{{\bmod (S)}}$$
(3)
$${\text{Gain}}\,{\text{Ratio}} = \frac{{{\text{Gain}}\left( {S,X} \right)}}{{{\text{Splitinfo}}\left( {S,X} \right)}}$$
(4)

6.2 Random Forest

Supervised learning algorithm that is used in both classification and regression, but most commonly used in the classification. This algorithm builds the decision tree using each data sample and takes the prediction of each decision tree. Then, voting is performed on every predicted result and selects that prediction that gets maximum votes and gives the best results. This algorithm also reduces the problem of over fitting to some extent. This algorithm works efficiently with large amount of data. Its accuracy is very high even after missing a large proportion of data.

7 Experimental Result

7.1 Parameter

True positive (TP) gives the correct result means if there is an attack, identify it correctly.

False positive (FP): means there is actually an attack, but it predicts it as normal.

Precision: tell the correct percentage of positive prediction.

Time: Gives how much time it takes to build the model.

Accuracy = TP + TN/TP + TN + FP + FN * 100.

Precision = TP/TP + FP * 100.

7.2 Results

Table 1 shows the comparison of J48 and random forest with varying numbers of attributes. As shown in the table, if we reduce the attributes, there seems no significant difference between TP, FP, and precision results, but there is a big difference in the time it takes to build a model. If we get almost identical or we can say better results with a smaller number of attributes then it makes no sense that we have to use complete 42 attributes (Figs. 1, 2, 3 and 4).

Table 1 Comparison of J48 and random forest
Fig. 1
figure 1

J48 algorithm with 9 attributes

Fig. 2
figure 2

Random forest algorithm with 9 attributes

Fig. 3
figure 3

J48 algorithm with 42 attributes

Fig. 4
figure 4

Random forest with 42 attributes

8 Conclusion

In this paper, we have used information gain for the extraction of attributes for intrusion detection. To classify these extracted attributes, we have compared J48 and Random forest decision tree algorithms. The accuracy of the parameters has shown some progress by taking only 08 attributes compared to using 41 attributes. The significant difference is in the time to build the model which shows remarkable improvement with fewer attributes. The experiment is performed on the WEKA tool using KDDcup99 Dataset. According to the results, J48 with 09 attributes gives better results. Still, some more work is required in this field. We propose to use other classification algorithms apart from J48 and Random Forest on varied datasets in the near future.