1 Introduction and Related Work

Due to the expansion of computer networks, the number of intrusion incidents is increasing gradually. This may challenge the network administrators to prevent unauthorized access to confidential and privileged information and this had led many researchers to focus on constructing a system called intrusion detection system (IDS). Various sorts of IDS approaches can be emphasized on the network access events in the context of system security management. The two common approaches are misuse detection and anomaly detection [1, 12, 31, 34]. In the former the detection is about observed behaviors that match a predefined pattern of events described as known attacks. The later approach aims at detecting user’s behavior that move away from the normal profile [5]. The drawback in misuse detection is that it cannot detect undefined attacks. To overcome this problem, anomaly intrusion detection is used which can predict an undefined attack by detecting any deviation from the learned user’s normal profile.

In order to resolve the disadvantages of these two intrusion detection approaches, several hybrid IDSs have been proposed recently by combining different machine learning techniques. Pfahringer [23], the winner of the KDDCup99, developed a cost-effective bagged boosting algorithm by integrating C4.5 decision trees, while Levin [16] built an optimal decision forest using Kernel Miner technique. Both have acceptable detection rates in detecting Normal, DoS and Probe attacks but undesirable rate in detecting R2L and U2R attacks. Depren et al. [5] suggested a hybrid IDS consisting of an anomaly detection component, a misuse detection component and a decision support system. To improve the detection rate further, Peddabachigari et al. [22] designed an IDS by fusing decision trees and support vector machines, while Xiang et al. [35] constructed a multiple-level tree classifier which contains three-levels of decision tree classification. However, these two methods suffer from low detection rates for unknown attacks.

Adaboost based intrusion detection systems are developed to improve the attacks detection rate in which the output of a classifier is boosted further by adjusting the learning weights [7, 11, 13, 21, 33]. These approaches promised a higher detection rate with a minimum false alarm rate, but suffered from higher resource consumption and slow training and testing processes. To overcome these issues many researchers have proposed feature selection techniques.

Feature selection (FS) is defined as the process of choosing an optimal prominent subset of features that represents the entire dataset [27, 28]. Principal Component Analysis (PCA) is one of the most widely used feature reduction technique for data analysis and compression. Venkatachalam and Selvan [29] applied the three feature reduction techniques of Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Independent Component Analysis (ICA) to the KDDCup99 dataset. These techniques perform a linear mapping of the data to a lower dimensional space in such a way that the variance of the data in the low dimensional depiction gets maximized. The dataset with reduced attributes are applied to Binary tree, ART and LAMPSTAR Neural Network classifiers with improved results.

Parallel computing may be a good solution to feature selection and classification, in which computations are carried out concurrently. Task parallelism tries to run different tasks in parallel while data parallelism targets to execute the same process on multiple data sets. These two parallel mechanisms are utilized to parallelize the feature selection process. Mohammad et al. [20] distributed the entire dataset into multiple nodes in the cluster and applied the feature selection algorithm to get sufficiently good subsets of attributes in a high dimensional search space through parallel computing for synthetic datasets only. Deng et al. [4] decomposed the whole high-dimensional dataset into many sub-tables and suggested an approach for parallel feature selection from a series of decision subsystems.

Recently, a simple and parallel computation approach called MapReduce [3], has been used by many researchers, which is a highly scalable parallel programming model for data-intensive and computation-intensive applications on a cluster of commodity machines. MapReduce has been successfully applied in data mining [10, 26, 30] and machine learning [2, 9, 14, 17, 18, 24, 37].

The salient features of MapReduce model are as follows. It hides many system-level details such as hardware heterogeneity and platform differences from the user and without human intervention parallelize the computation across large-scale clusters of machines. This model also handles machine failures in the cluster and schedules communications among the cluster of machines to make an efficient use of networks and storage media.

The main contributions of this paper can be enumerated as follows.

  1. 1.

    Parallel version of Binary Bat Algorithm (PBBA) using the iterative MapReduce programming model to select the prominent subset of features.

  2. 2.

    Distributed classification model using Naive Bayes algorithm to improve the attacks classification accuracy and attacks detection time.

The subsequent sections of the paper are organized as follows. Section 2 presents sequential Bat and sequential Binary Bat algorithms. The MapReduce based Parallel Binary Bat algorithm is presented in Sect. 3. The sequential and parallel Naïve Bayes classification algorithms are presented in Sect. 4. Section 5 presents the experimental results and discussion, while Sect. 6 gives the conclusions of the proposed work.

2 Overview of the Bat Algorithms

2.1 Bat Algorithm

Bat Algorithm (BA) has been developed based on the echolocation behavior of bats. Typically, bats emit a short pulse of sound and receive the echo of the sound after a fraction of time which is used to determine their remoteness from an object. Also, they have the capability to distinguish between an obstacle and a prey which helps them to hunt in a complete dark environment [36]. In BA, an artificial bat has a position, velocity and frequency vectors which are updated during the course of iterations. The artificial bats move around the search space utilizing the position and velocity vectors or updated position vectors within the continuous real domain.

For each bat \((b_{i} )\), there is a position \((x_{i} )\), frequency \((f_{i} )\) and velocity \((v_{i} )\). At each step of t, the bats move to the next position with new velocities which is

$$\begin{aligned} v_{i} \left( {t+1} \right) =v_{i} \left( t \right) +\left( {x_{i} \left( t \right) -gbest} \right) f_i \end{aligned}$$
(1)

where gbest is the best solution obtained so far. After computing the velocity, the position of the bat is updated as follows.

$$\begin{aligned} x_{i} \left( {t+1} \right) =x_{i} \left( t \right) +v_{i} \left( {t+1} \right) \end{aligned}$$
(2)

The frequency of ith bat is computed in each iteration as follows.

$$\begin{aligned} f_{i}=f_{min} +\left( {f_{max} -f_{min} } \right) \beta \end{aligned}$$
(3)

where \(\beta \) is a random number uniformly distributed in the range [0,1]. The exploitation capability of BA is improved by using a random walk method which is given by

$$\begin{aligned} x_{new} =x_{old} +\varepsilon A^{\prime } \end{aligned}$$
(4)

where \(\upvarepsilon \) is a random number between [−1,1] and \(A^{\prime }\) is the loudness of the emitted sound. The loudness and pulse emission rate are adjusted in each iteration, as follows.

$$\begin{aligned} A_{i} \left( {t+1} \right)= & {} {\alpha }~A_{i} \end{aligned}$$
(5)
$$\begin{aligned} r_{i} \left( {t+1} \right)= & {} r_{i} \left( t \right) +\left[ 1-\exp \left( -{\gamma t} \right) \right] \end{aligned}$$
(6)

where \(\alpha \) and \(\gamma \) are constants.

2.2 Binary Bat Algorithm

In the feature selection problem, each feature subset is coded as a binary string of 1s and 0s where 1 denotes the presence and 0 represents the absence of a feature. To map the problem with BA, each bats position is considered as a binary value which creates a binary search space and the bats can move to a new position by flipping various numbers of bits. In the continuous version of BA, the artificial bats can move around the search space by utilizing the position and velocity vectors within the continuous real domain. Consequently, the concept of position updating can be easily implemented for bats by adding velocities to positions and however, the meaning of position updating is different in a discrete binary space. The position updating process cannot be performed like the BA which has a continuous version and the movement of the bat in the search domain has only the binary values ‘0’ and ‘1’, known as binary BA. Hence, the binary version of BA employs a different strategy to update its velocity and position. Therefore, a transfer function is used to change the bats’ positions from “0” to “1” or vice versa [15, 19, 25]. The transfer function for updating the bat position in the binary space is

$$\begin{aligned} V\left( v_i^k\right) =\left| \frac{2}{\pi }\arctan \left( {\frac{\pi }{2}} \right) v_i^k \right| \end{aligned}$$
(7)

where \(v_i^k \)is the velocity of the bat ‘i’ in kth dimension. Now the position of the bat is modified as

$$\begin{aligned} x_i^k \left( {t+1} \right) =\left\{ {{\begin{array}{lll} 0&{} if &{} V(v_i^k \left( {t+1} \right) )>\delta \\ 1&{} if &{} V(v_i^k \left( {t+1} \right) )<\delta \end{array} }} \right. \end{aligned}$$
(8)

where \(\delta \) is uniformly distributed between [0,1]. After updating the position, the fitness of each bat is evaluated.

2.3 Fitness Function for BBA

Feature selection (FS) is defined as the process of choosing an optimal prominent subset of features that represents the entire dataset. All of these feature selection methods may be grouped into filter based and wrapper based approaches. The filter approach relies on general characteristics of high-dimensional datasets with fast evaluation and feature selection in subsets and it does not take into account mining algorithms. But, the wrapper approach is an optimizing algorithm that adds or removes features for construction of various subset features and then employs a mining algorithm to evaluate the subset of features.

The present work deals with unsupervised feature selection using the k-means clustering algorithm with wrapper approach in order to classify a given dataset through a certain number of clusters. The goal of the k-means algorithm is to find k points of a dataset where point ‘k’ is the cluster center or centroid of each cluster. In particular, k- means clustering algorithm is used to cluster or group ‘n’ data points into ‘k’ disjoint subsets \(s_{j}\), containing \(n_j \) data points so as to minimize the mean square error (MSE) and is given as,

$$\begin{aligned} \hbox {Mean Square Error (MSE)} = \mathop \sum \nolimits _{j=1}^k \mathop \sum \nolimits _{n\in s_j } \left| {x_n -\mu _j } \right| \end{aligned}$$
(9)

where \(x_n \) is a feature vector representing the nth data point and \(\mu _j \) is the geometric centroid of the data point in \(s_j \). Thus, fitness is evaluated based on the cluster quality measured using Mean Square Error.

3 Hadoop Based Parallel BBA (HPBBA)

Hadoop is a framework proposed by Google which provides a fault tolerant and reliable environment for parallel algorithms and the programmers can place various components of the distributed application in the networked computers resourcefully. Furthermore, to handle large volume of data, this framework supports a scalable file system called Hadoop Distributed File System (HDFS) which stores very large files in blocks across various machines in a large cluster of commodity hardwares.

In general, nature inspired computing algorithms need high computational resources to find an optimal solution within exhaustive search spaces. To reduce computation time and memory requirement, Hadoop parallel implementation is employed with binary BA. Hence, nature inspired computing algorithms can be articulated as iterative procedures to obtain optimal solutions with minimum computing resources and storage requirements. The iterative operations can be expressed as fitness evaluation, finding local optimum solution and global solution over the distributed data. We establish an extension of the MapReduce programming model, known as Iterative MapReduce method, which supports iteration as a primary construct. The proposed iterative MapReduce model is used for feature selection of KDDcup99 dataset which is illustrated in Fig. 1.

Fig. 1
figure 1

Iterative MapReduce model

Fig. 2
figure 2

Mapper–Chainer model for Parallel Binary Bat algorithm

The different observations about the iterative MapReduce programming model are: (1) there is a one-to-one correspondence between the map task and the reduce task, since the execution unit in both the map task and reduce task is the processor core (2) Each iteration is articulated as only one MapReduce job (3) the locality optimization supported by the Hadoop framework reduces the loading effort of DFS data in to a map for each iteration.

The proposed feature selection methodology shown in Fig. 2 adopts the Mapper–Chainer model with two parts. The first part consists of a single MapReduce job responsible for bat initialization and the second part is an iterative MapReduce job to determine the best optimal solution.

Part-1: Bat Initialization: The bat search space is assumed as an area that contains many food/prey sources on it. Since, the best of food/prey source locations are not known ahead, it is required that an initial population is randomly generated from an integer-valued vector with a number of bats ‘\(\hbox {n}_\mathrm{b}\)’ and dimension ‘d’. The bats are initialized as per Algorithm 1 with the initial value for the minimum frequency set as 0 \((\hbox {f}_{\mathrm{min}}=0)\) and the maximum frequency set as 1 \((\hbox {f}_{\mathrm{max}}=1)\). The pulse rate \(\hbox {r}_{\mathrm{i}}\) and loudness \(\hbox {A}_{\mathrm{i} }\) for each bat are also initialized. After the initialization of various bat parameters, the initial fitness value for each bat is calculated and finally the best current bat is obtained.

figure a

The dataset ‘D’ is partitioned into 64 MB size multiple input files, stored by the HDFS across data nodes of a Hadoop cluster. After the bat populations are randomly initialized throughout the search space and after obtaining the best bat, (composite key, values) pairs are obtained as inputs for the iterative MapReduce part. In complex operations, where several parameters are used, the basic (key, value) pair is not sufficient. Hence, a composite key is created using Writable Comparable interface object of Hadoop framework.

Part-2: Iterative MapReduce: The map tasks in the iteration will be started after completion of bat initialization. Each map function sequentially reads each partitioned dataset from its local input split as per the format (composite key, values). Now, multiple map functions runs in parallel mode and the fitness value \((\hbox {f}_{\mathrm{p}}(\hbox {b}_{\mathrm{i}}))\) is computed for each bat with the given partitioned dataset. The pseudo code of HPBBA-Map is given in Algorithm 2.

figure b

Next, these fitness values emitted by different map functions of the ‘m’ different partitions with dataset are summed up in the combiner for ‘\(\hbox {n}_{\mathrm{b}}\)’ bats. The average fitness value for each bat of the entire dataset \((\hbox {f}_{\mathrm{D}}(\hbox {b}_{\mathrm{i}}))\) is obtained as per Algorithm 3.

figure c

Finally, the combiner emitted values in the form of (composite key, values) are obtained by Algorithm 4. The next position of the bat is updated and a new solution is generated. The best solution among the solutions from the bats is stored in Gbest state variable and the updated bat with its fitness is passed to the next iteration.

figure d

Part-2 of Fig. 2 is executed until the termination criteria are met. The pseudo-code for the proposed feature methodology is given in algorithm 5.

figure e

It can be seen here that the various stages of the Binary Bat algorithm can be combined with iterative MapReduce in a trouble-free approach.

3.1 Time Complexity of HPBBA

The running time of the HPBBA is the total time required to initialize the bat (Algorithm 1) and to perform the iterative MapReduce operation. The time complexity to initialize the bat population is \(\hbox {T}_{\mathrm{init}} =\hbox {t}_{\mathrm{k}} *{\hbox {m}} *\left( {2\hbox {nkd}+\hbox {nk}+\hbox {nd}} \right) /\hbox {n}_{\mathrm{c}}\), where \(\hbox {n}\) is the number of instances, \(\hbox {d}\) is the number of attributes, \(\hbox {k}\) is the number of clusters, \(\hbox {t}_{\mathrm{k}} \) is the number of iterations required for k-means to converge, \(\hbox {n}_{\mathrm{c}} \) is the number of cores in the cluster and \(\hbox {m}\) is the number of partitions of the dataset.

The time complexity of the map function in part-2 of iterative MapReduce is \(T_{map} =n_b\,*\,t_k\,*\,\left( {2nkd+nk+nd} \right) /n_c \), where \(n_b \) is the number of bats. The combiner function requires \(T_{com} =n_b *c\) time and the reducer takes a time of \(T_{red} =n_b *r\). Here c is the number of combiners and r is the number of reducers. The total running time ‘T’ of HPBBA is \(T_{init} +T_{map} +T_{com} +T_{red} \).

In the total running time ‘T’, the map function which uses k-means clustering to cluster the instances of the dataset requires the highest computation time \(T_{map} \). This \(T_{map} \) increases linearly with the number of attributes as well as instances of the dataset. Also, it is assumed that the time for node communication, which depends on quality of the network, is constant. Hence, \(T_{init} +T_{com} +T_{red} \) may not be considered for running time execution and the resultant running time is based only on \(T_{map}\) and the time complexity of HPBBA is \(O\left( {n_b *t_k *\left( {2nkd+nk+nd} \right) /n_c } \right) .\)

4 Naive Bayesian Classification

A Naive Bayes classifier is a simple probabilistic classifier based on Bayes theorem with effective independence assumptions. It assumes that the presence of a particular attribute of a class label such as normal or attack categories in a dataset is unrelated to the presence of any other attribute, for the class condition variable.

The Naive Bayes classification model is trained with a known training data set. The computation of maximum likelihood value for the occurrences of events is used to estimate the required parameters of Naive Bayes model. This characteristic of Naive Bayes enables the user to work with the Naive bayes model without considering the Bayesian probability or applying any Bayesian approaches in practical applications [6]. The following features of the Naive Bayes classifier make it very useful in many research domains.

  • Each distribution can be independently estimated as a one dimensional distribution

  • Minimum computational cost is ensured

  • Provides efficiency to handle missing values and noise in the dataset

  • The variance is very low due to a lesser amount of searching.

  • Naive Bayes follows Incremental learning and due to this characteristic the new training samples are learnt quickly.

The Parallel Naïve Bayesian Classifier (PNBC) is implemented as a combination of map and reduce functions. The pseudo code of the map function is shown in Algorithm 6. Each map function sequentially reads each vertically partitioned dataset from its local input split as per the format (Composite Key, Values). Multiple map functions run in parallel and compute the conditional probability values for the given vertically partitioned dataset.

figure f

Next, these conditional probability values emitted by different map functions for the ‘m’ different partitions and ‘d’ attributes are multiplied in the reducer for the same classes of the entire dataset. The pseudo code of the reducer function is shown in Algorithm 7. The class with the maximum obtained conditional probability is chosen as the class label for the given input X.

figure g

The implementation of the parallel Naive Bayesian classifier is shown in the above algorithms, where evaluation of the conditional probabilities \(p(C_{j}{\vert }X_{i})\) is done in parallel, by distributing the attributes and instances among different cores in the cluster of machines.

4.1 Time Complexity

Let n be the number of instances in the dataset and d is the number of attributes. The dataset is distributed among \(n_c \) cores in the cluster. The computational complexity of PNBC is \(O(n *d/n_c )\) and it shows that the computational time reduces gradually when the number of cores in the cluster increases.

5 Experiments and Results Analysis

In this section, the datasets, software and cluster configuration used in the experiment are described. The feature selection results using the proposed HPBBA and the classification results using the proposed PNBC are presented. The detection results from the proposed parallel classification approach as well as those of the winners of KDDCup99 competition and other published work in available literature are compared.

5.1 Descriptions of Datasets, Software and Cluster Configuration

The proposed method for intrusion detection was evaluated using the KDDCup99 dataset which is widely used as a benchmark dataset by many researchers for IDS evaluation. The three independent sets of KDDCup99 are the full KDDCup99 training set, 10 % KDDCup99 training set and 10 % KDDCup99 test set. Each record represents a network connection described by 41 features and a class label specifying the type of this record as either normal or one of 39 specific attack types. The attack types in KDDCup99 can be categorized as either a normal class or one of four attack classes, namely Denial of Service (DoS), Probe, Remote to Local (R2L) and User to Root (U2R). Table 1 lists the number instances present in the KDDCup99 training and test datasets.

Table 1 The number of instances in the training and testing datasets

The performance of our algorithms are evaluated with an in-house Hadoop cluster equipped with 16 nodes. Each node has an Intel core-i5 3.2 GHz quad core processor, 8 GB main memory and runs on the Ubuntu 12.1 operating system, on which Java JDK 1.6, Hadoop 2.6 and Apache Mahout Library are installed. All the nodes in the cluster are equipped with Gigabit Ethernet network interface cards and connected to the Gigabit ports on the high speed manageable switch. The nodes in the cluster use the secure shell protocol to communicate with one another. The default Hadoop parameter configurations are used to set the replication factor as 3 and the number of Map tasks per core is set as one.

5.2 Results for Feature Selection Experiment

The feature selection experiment using PBBA has been performed on full KDDCup99 dataset with 41 features and the most significant features that contribute to the improvement of classification accuracy are listed in Table. 2.

Table 2 List of features selected using PBBA

In the following sections, the various other characteristics of the proposed work such as speedup, scaleup and sizeup are observed by employing different sizes of KDDCup99 training dataset on a cluster of different sizes.

Fig. 3
figure 3

Speedup of HPBBA

5.3 Speedup

Speedup or strong scalability evaluates the capability of the parallelism and is used to enhance the execution time. It is defined as the ratio of the sequential execution time to the parallel execution time. Speedup can be expressed as

$$\begin{aligned} Speedup\left( {m,D} \right) =\frac{ET\left( 1 \right) }{ET\left( m \right) } \end{aligned}$$
(10)

where m is the number of cores in the computing cluster, ET(1) is the execution time of the tasks on one core of the computing node, ET(m) is the execution time of the parallel tasks with m cores [24, 32].

To measure the speedup performance of our algorithm, the full KDDCup99 dataset of 5 million instances is used in a single core and the number of cores has been increased in a step by step manner. In particular, HPBBA algorithm is first applied in a system consisting of 1 core, and then gradually increased. The number of cores in the experiment varies from 2 to 64 in the order of 2, 4, 8, 16, 32 and 64. The speedup evaluation experiment on datasets has been repeated with different sizes by duplicating the full KDDCup99 dataset into 2, 3 and 4 times. Figure 3 illustrates the experimental results.

The speedup of the PBBA algorithm becomes approximately linear when the size of the dataset increases, especially when the dataset contains 15 and 20 million instances. It is observed that the performance of 64-core system on very large dataset is much better than the performance of the same with a smaller dataset which is due the fact that the data processing takes more time than the communication time owing to network latency among the nodes in the cluster and the fault-tolerance time, which gives a good speedup performance.

The ideal parallelism exhibits a constant speedup with increasing number of cores in the computing cluster. However, in practice, it is very difficult to achieve the linear speedup because of the communication cost among the nodes due to network latency and the skew of the slaves. Though, the dataset is partitioned uniformly among the map tasks in the cluster, the read/write operations varies among the disk drives in the nodes and the slowest slave node decides the total time needed to complete the task.

5.4 Scaleup

Scaleup or the weak scalability defines the ability of the parallel computation algorithm to grow both the number of processing cores in the cluster and the dataset size. It is also defined as the ability of an m-times larger system in the cluster to perform an m-times larger job in the same run-time as the original system and this can be expressed as

$$\begin{aligned} Scaleup\left( {m,D} \right) =\frac{ET\left( {1,D} \right) }{ET\left( {m,mD} \right) } \end{aligned}$$
(11)

where m is the number of cores in the computing cluster, ET(1,D) is the execution time of the tasks on one core with data size of D, ET(m, mD) is the execution time of the parallel tasks with m cores in the computing cluster with data size m times of D. An ideal parallelism shows a constant scale up with increasing number of computing cores in the cluster and dataset size [24, 32].

To demonstrate how well the PBBA scales up, scalability experiments were performed where the size of the dataset was increased in proportion to the number of cores. The experiments with datasets size of 1.25, 2.5, 5, 10, 20 and 40 million instances were performed on 2, 4, 8, 16, 32 and 64 cores respectively and Fig. 4 shows the performance results on these datasets. It is observed that the scalability of HPBBA decreases slowly when the size of the dataset increases and it maintains a value of scale up higher than 81 %. The HPBBA algorithm is seen to scale very well.

Fig. 4
figure 4

Sclaeup of HPBBA

Fig. 5
figure 5

Sizeup of HPBBA

5.5 Sizeup

Sizeup measures the ability of the parallelism to handle the data growth. It calculates how much longer it takes to complete the parallel tasks, when the dataset size is n-times larger than the original dataset. Sizeup analysis is performed by keeping the number of computing core in the cluster as constant and increasing the size of the datasets by the factor ‘n’. It can be expressed as follows

$$\begin{aligned} Sizeup\left( m \right) =\frac{ET\left( {m,nD} \right) }{ET\left( {m,D} \right) } \end{aligned}$$
(12)

where m is the number of computing cores and n is the incremental factor of the data size. T(m,D) is the execution time of the parallel tasks with m computing cores for the dataset size D and T(m,nD) is the execution time of parallel tasks with m computing cores and with dataset size n times of D [24, 32].

To measure the performance of sizeup, the number of cores were chosen as 4, 8, 16, 32 and 64 with the dataset sizes of 2.5, 5, 10, 20 and 40 million instances. Figure 5 shows the sizeup results on different cores. It is observed that when the number of cores is small such as 4, 8 and 16 the sizeup performances vary modestly. However, the value of sizeup on larger number of cores such as 32 and 64 decreases significantly when compared to smaller size cores of 4, 8 and 16 on the same datasets.

5.6 Results for Classification Experiments

The selected features in the above experiment are used as the inputs of the classifier. The parallel Naive Bayes classification algorithm is used for evaluating the classification accuracy of the proposed feature selection approach. The trained model of the distributed classifier is constructed by employing the full KDDCup99 training dataset. To test the detection accuracy of the trained classifier model and performance of parallelism, the 10 % KDDCup99 test dataset is duplicated 10 times.

The output from the detection model is known as the confusion matrix. Table 3 lists the confusion matrix and consists of four classes: False Positive (FP), and False Negative (FN), True Positive (TP) and True Negative (TN).

Table 3 Confusion matrix

The classification performances of the proposed algorithms are measured using two metrics, namely Detection Rate (DR) and False Positive Rate (FPR). These can be calculated from the confusion matrix, and are defined as,

$$\begin{aligned} \hbox {Detection Rate (DR)}= & {} \hbox {TP{/}(TP+FN)}\end{aligned}$$
(13)
$$\begin{aligned} \hbox {False Positive Rate (FPR)}= & {} \hbox {FP{/}(FP+TN)} \end{aligned}$$
(14)

Looking at the results of Table 4, it is observed that the overall performance in detection rates of five classes have been improved while considering the 24 features selected from the feature selection approach PBBA. It can be seen that the PNBC performs better in detecting the Normal, DoS and Probe attacks with a lowest false positive rate of 1.57 %. The PNBC is enhanced in detecting DoS and Probe attacks and the performance is not exceptional in R2L and U2R category of attacks due to the following reason.

  • There is an unequal distribution of the attack class categories in full KDDCup99 training dataset that could significantly affect the classifier learning and detecting of R2L and U2R attacks.

  • In the complete KDDCup99 dataset, 79.4 and 19.7 % of the instances belong to DoS and Normal categories respectively and the remaining 0.9 % of the training dataset instances belong to Probe, R2L and U2R categories of attacks.

  • However, the number of instances for Probe attacks is 41,102, which is sufficient for training the classifier.

  • Also, more number of new attacks in the R2L and U2R test dataset which are not present in the training dataset affects its detection rate.

Table 4 Detection rates and false positive rate of PNBC
Table 5 Comparison of overall detection rates

In the detection of attacks, PNBC outperformed the other classifiers in detecting the Probe, R2L and U2R attacks. As shown in Table 5, PNBC achieved the highest probe attacks detection rate of 93.54 % and the R2L attacks detection rate is at 47.41 % and the U2R attacks detection rate is at 74.13 % in classifying the KDDCup99 test dataset.

Table 6 Running time of PNBC on different sizes of clusters

Speed is always an important metric as the use of intrusion detection system is for real time applications. Table 6 shows the running time recorded for the training and testing of the classifier PNBC on different sizes of the cluster. The number of cores were chosen as 2, 4, 8 and 16. It is observed that the training takes longer time than testing and when the number of cores is increased the training and testing time decreases significantly. The total running time for classifying all the connection records of 100 % KDDCup99 test dataset on the cluster of 16 cores is 176 s.

From the literature survey, it has been found that mostly the classification and feature selection of IDS have been experimented in sequential algorithm with single node and the results in terms of complexity, detection rate and alarm rate have been presented. But, the proposed work was experimented with parallel algorithm in which multiple nodes were used for execution. So, speedup, sizeup and scaleup parameters of the present algorithm are more productive compared to existing sequential algorithm methods. Our work has also been compared with some existing approaches of sequential algorithm and the detection rate and alarm rate are more in the proposed parallel algorithm approach.

6 Conclusion

This work proposes methods for the fast and scalable IDS for large volume data and aims to improve upon existing works from two perspectives. Firstly, the parallel version of Binary Bat Algorithm has been developed which is employed as a wrapper based feature selection tool and has greater impact on minimizing the memory requirement for IDS and computational complexity of the classifier. Next, a parallel Naive Bayes model has been employed as the classification engine in a multi node cluster which improves the attack detection rate of Probe and U2R attack categories and also reduces the detection time drastically which is proportional to the number of nodes in the cluster.

The main findings of the present work can be summarized as follows:

  1. 1.

    The proposed system is more suitable for Big data analysis.

  2. 2.

    The detection rate of Probe (93.54 %) and R2L (74.13 %) attack categories of the proposed system is better as compared to existing works discussed in the literature.

  3. 3.

    The performance of the parallel execution system in terms of scale up, speedup and size up has been measured in a cluster with different sizes and it is found that the performance is proportional to the number of nodes in the cluster.