Training a Neural Network for Cyberattack Classification Applications Using Hybridization of an Artificial Bee Colony and Monarch Butterfly Optimization

Ghanem, Waheed A. H. M.; Jantan, Aman

doi:10.1007/s11063-019-10120-x

Training a Neural Network for Cyberattack Classification Applications Using Hybridization of an Artificial Bee Colony and Monarch Butterfly Optimization

Published: 01 October 2019

Volume 51, pages 905–946, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Processing Letters Aims and scope Submit manuscript

Training a Neural Network for Cyberattack Classification Applications Using Hybridization of an Artificial Bee Colony and Monarch Butterfly Optimization

Download PDF

Waheed A. H. M. Ghanem^1,2,3 &
Aman Jantan¹

770 Accesses
28 Citations
Explore all metrics

Abstract

Arguably the most recurring issue concerning network security is building an approach that is capable of detecting intrusions into network systems. This issue has been addressed in numerous works using various approaches, of which the most popular one is to consider intrusions as anomalies with respect to the normal traffic in the network and classify network packets as either normal or abnormal. Improving the accuracy and efficiency of this classification is still an open problem to be solved. The study carried out in this article is based on a new approach for intrusion detection that is mainly implemented using the Hybrid Artificial Bee Colony algorithm (ABC) and Monarch Butterfly optimization (MBO). This approach is implemented for preparing an artificial neural system (ANN) in order to increase the precision degree of classification for malicious and non-malicious traffic in systems. The suggestion taken into consideration was to place side-by-side nine other metaheuristic algorithms that are used to evaluate the proposed approach alongside the related works. In the beginning the system is prepared in such a way that it selects the suitable biases and weights utilizing a hybrid (ABC) and (MBO). Subsequently the artificial neural network is retrained by using the information gained from the ideal weights and biases which are obtained from the hybrid algorithm (HAM) to get the intrusion detection approach able to identify new attacks. Three types of intrusion detection evaluation datasets namely KDD Cup 99, ISCX 2012, and UNSW-NB15 were used to compare and evaluate the proposed technique against the other algorithms. The experiment clearly demonstrated that the proposed technique provided significant enhancement compared to the other nine classification algorithms, and that it is more efficient with regards to network intrusion detection.

A new evolutionary neural networks based on intrusion detection systems using multiverse optimization

Article 07 November 2017

Intrusion detection in networks using cuckoo search optimization

Article 03 February 2022

A study on intrusion detection using neural networks trained with evolutionary algorithms

Article 07 December 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Today system assaults are inescapably pervasive and growing in number, which automatically results in the unprecedented demand for systems which can detect intrusion. IDSs were created for the first time in 1980 by Anderson [1] and were later perfected by Denning [2]. This system was either manifested as a device or a software tool and has been constantly improving since its inception. The main purpose of the intrusion detection system (IDS) is to seek out and react to intrusive and harmful activities aimed at the system’s resource and facilities. This can be achieved by paying close attention to the activities of the system and analyzing networks [3]. IDSs can be categorized into those that detect anomalies and those that detect misuse: what makes the two different is the means of detection. To detect misuse the IDS searches for the attack fingerprint in quite a large database that has all attack signatures stored. Conversely, to detect anomalies, the IDS can notice how the system behaves differently from normative behavior. The use of both methods creates another method known as the ‘hybrid technique’. Naturally, it produces better results than using a singular separate method [4].

Recently data mining and neural networks have occupied a crucial position in increasing the quality in the performance of IDSs. This has facilitated the process of categorizing the kinds of attacks needed to calculate the effectiveness of the IDS. The main purpose of the data mining procedure is to obtain simulating information from big knowledge warehouses and convert it into a data structure that can be understood. The most used methods in data mining are: information preprocessing, clustering, recognizing patterns and classifying the information. The most significant technique is the classification as it is of the utmost importance to determine accurately the aimed class for each situation in the information. The categorization implies finding the hidden pattern in information and can be considered a usual issue in data mining, and in learning how to operate a machine [5].

Quite a few suggestions regarding data mining have been utilized by creating systems to detect anomalies and some examples could be the artificial neural networks (ANNs), radial basis function (RBF) [6, 7], multi-layer perceptron (MLP) [8,9,10], fuzzy neural network (FNN) [9], self-organizing map (SOM) [11, 12], support vector machines (SVMs) [13,14,15,16] and SVM with modified versions [17, 18].

Recently biology and natural systems have been used by researchers, and innovative systems such as ‘Swarm Intelligence’ have been obtained thereby; these depicting the conduct of animals and insects [19]. This conduct includes activities such as finding food resources, creating their nests, and moving the nests from one place to another-this is then analyzed thoroughly (this also aided the improvement of the IDS performance). By being better able to trace the source of the attack it was possible to differentiate between malicious and non-malicious behavior, and also to offer elucidation pertaining to some complicated issues [20].

Artificial neural networks (ANNs) can be defined as the most significant parts of the Artificial intelligence. They are categorized as either Supervised Learning Neural Networks or Unsupervised Learning Neural Networks; the difference between the two being that the supervised network has to learn under supervision of another person, whereas the unsupervised kind of network does not need this (as indicated by its name). To be successful in its activity the system depends upon the following factors [21]: the architecture of the system, the training algorithm, and the features utilized in the training. These factors make designing an optimal neural network difficult [22]. Furthermore, each of these factors should be chosen correctly so that the training algorithm does not fall into a local minimum. In [23] there have been reports on methods based on the heuristic algorithms in order to obtain a good ANNs model.

Artificial Neural Networks have certain qualities that permit them to resolve a range of issues such as pattern classification, regression, and forecasting. Some of the most notable traits are the ability to learn from examples, adaptability to generalize and to be able to solve problems such as the classification of patterns, and to approximate functions and optimization [24, 25].

ANNs have been created in different kinds of multilayer Neural Networks. The majority of applications utilize a feed-forward type of NNs that implies the use of the typical back propagation (BP) learning method. This type of training usually concludes by utilizing the back-propagation (BP) gradient descent (GD) method. When using this algorithm some issues may be confronted as the algorithm is based on the gradient. One of the most notable problems is the propensity to get stuck in a local minimum as a result of the quite-low speed of convergence [26, 27].

Moreover, the BP method has to be able to determine a few essential learning parameters such as the rate of learning, the momentum, and the prearranged structure. The BP method has a prefixed NNs structure, meaning it trains only its weights in its structure. As a result there is no solution as for how to design a nearly perfect NNs arrangement for an application. Global optimum search methods are capable of avoiding local minima and are usually utilized to regulate weights of NNs, such as the artificial bee colony (ABC) [21, 28], particle swarm optimization (PSO) [29, 30], evolutionary algorithms (EA), simulated annealing (SA) and ant colony optimization (ACO), meaning they can also be used in order to dispose of the problems in standard and typical algorithms.

This research focuses on dealing with the fixed structure of the ANN. ANNs are considered some of the most widely used machine-learning techniques. Many complex and practical problems have been successfully solved, which are conversely difficult to solve through other methods. Despite this, the general architecture of ANNs still suffers from the local optima issue and low convergence speed [31,32,33]. There are three major drawbacks to an ANN-based intrusion detection system.

The error function of an ANN is a multimodal function that is frequently trapped into local minima.
This type of ANN-based IDS demonstrates a slow convergence.
Over-fitting usually creates an overly complex model.

To overcome the shortcomings attributed to the BP training algorithm and avoid falling into a local minimum, we use our previously proposed HAM algorithm to train ANN [34, 35]. Our new proposal may provide an effective and suitable alternative solution for the problem of Multilayer Perceptron Neural Network training algorithm and the global and local optima in a multimodal search space. It may be evident from our previous work that HAM algorithm could guarantee to find a global optimal solution, whereas the BP algorithm could only guarantee finding the initial point at the end of the slope of the search space (local optimum).

In this article we propose the new HAMMLP technique which improves the intrusion detection rate as well as reduces the false alarm rate. The main idea of the new approach is to solve the problem of the training algorithm in ANN to identify new attacks of IDS, and to evaluate three IDS datasets, two of which are new, namely UNSW-NB15 and ISCX 2012, as well the antique dataset KDD Cup 99. In this work we demonstrate that we managed to solve the dataset-related problems whilst managing to enhance the detection of intrusions. The main contributions in this research are summarized, as follows:

1.
Building a new hybrid HAM algorithm that optimizes the MLP neural network and achieves its effectiveness in addressing the shortcomings of artificial neural networks in the field of network intrusion detection.
2.
Assessing the performance, reliability and validity of the new technique in detecting a new attack by using two new datasets (ISCX 2012 and UNSW-NB15) and compare them with the (KDD Cup 99) dataset.
3.
Comparing our new proposed approach with other evolutionary and swarm intelligence algorithms, using the four of IDS datasets to confirm the applicability of our approach.
4.
Comparing our work with other related works of literature by using the datasets to evaluate the performance of our proposal.

The advantages of our proposal as following:

A high accuracy rate in detecting the attacks on the network.
The possibility of detecting an unknown attack.
Reduce the false alarm rate.

Section 2 reviews the materials and methods which include: describing the methodology then outlining the mathematical overview of the neural network and HAM algorithm; then discussing how the HAM algorithm could be deployed to train the MLP. Section 3 describes the experimental setup which includes: the validation of the IDS, algorithms and parameters, and performance measuring. Section 4 presents the experimental results. Section 5 summarizes the conclusions and provides directions for future research.

2 Materials and Methods

2.1 Hybrid Algorithm Based on Artificial Bee Colony and Monarch Butterfly Optimization

The most important factors in metaheuristic algorithms are the exploitation and exploration search mechanisms. A good metaheuristic algorithm has the ability to strike a balance between these two mechanisms, thereby enhancing the solving of low and high-dimensional optimization problems. The exploitation mechanism is based on the present knowledge which is to seek better solutions; while the exploration mechanism is based on fully searching the problem space for an optimal solution [34, 35].

In general by analyzing the standard MBO algorithm we notice that it has the ability to explore the search space very effectively; it also has the ability to find the global optimum in a fast speed; however, it has a poor ability to exploit the local search space due to the occasional use of Levy flight by the updating operators, which leads to large steps (or moves). On the other side we notice that the ABC algorithm has the ability to explore the search space relatively well, but has better ability in finding local optima through the two phases of employee and onlooker bees, which are considered local search processes. ABC is mostly based on selecting the solutions that improve the local search. There is one fundamental difference between those two phases: an onlooker bee relies on the probability value of the solution in order to select it where it chooses solutions that have high fitness value, while solutions with low fitness values are used to produce the trial solutions. Global search, on the other hand, is implemented in the ABC algorithm by the scout phase, resulting in the reducing of the convergence speed during the search process.

As presented in our previous work [34, 35], the main idea of the HAM algorithm is based on two improvements; first, to modify the butterfly adjusting operator in the MBO algorithm in order to improve the exploitation versus exploration balance, by increasing the search diversity and counterbalancing the shortfall of ABC algorithm in global search efficacy. The modified version of the operator is shown in algorithm 1. The second improvement is to integrate the modified butterfly adjusting operator from MBO in place of the first phase in the standard ABC algorithm (the employee phase). The improved operator is named as “employee bee adjusting operator”.

The proposed Hybrid ABC and MBO (HAM) algorithm is shown in Fig. 1. This algorithm includes four phases: Initialization, Employee bee adjusting phase, Onlooker bee phase and Scout bee phase (the onlooker and scout phases are the same phases inherited from the standard ABC algorithm). Thus, the new HAM algorithm is essentially an integration of the effective local search phase (onlooker bee) and two global search phases (Employee bee adjusting and Scout bee) for effective global optimization.

In the Initialization phase we need to define all the variables that would be defined in the standard ABC algorithm and assign them their respective suitable values. The HAM algorithm adopts all parameters from the original ABC algorithm then adds three new control parameters: limit1, limit2 and the maximum walk step parameter; these three parameters are used in the employee bee adjusting phase.

In the employee bee adjusting phase, each employee bee is assigned to its food source and in turn generates a new one either by using Levy flight or through mutation operators, which are based on the two control parameters (limit1 and limit2). These parameters are used to fine-tune the exploitation versus exportation by improving the global search diversity. The employee bee adjusting phase is very simple and is used to update all solutions in the bee population, where each solution is a D-dimensional vector.

The first step in this phase is to calculate a walk step “$ dx $” for the ith bee using the levy flight in Eq. 1, and calculate a weighting factor “$ \propto $”using Eq. 2, where $ S_{max} $ represents the max walk step that a bee individual can move in one step, and t is the current generation. Then, for each element j of the D dimensions, if (rand$ \ge $limit1), the algorithm uses Eq. 4 to update the solution element:

$$ dx_{k} = levy(x_{j}^{t}) $$

(1)

$$ \propto = S_{max}/t^{2} $$

(2)

$$ x_{i,j}^{t + 1} = x_{i,j}^{t + 1} + \propto \times (dx_{k} - 0.5) $$

(3)

$$ x_{i,j}^{t + 1} = x_{best,j}^{t}; $$

(4)

where $ x_{i,j}^{t + 1} $ is the jth element of solution $ x_{i} $ at generation t + 1, which represents the location of the solution i, while $ x_{best,j}^{t} $ is the jth element of $ x_{best} $ at generation t, which represents the best location among the food sources so far with respect to the ith bee. In contrast, if (rand$ < $. limit1) then another set of updates are performed. First, a random food source (equivalent to a random solution or bee) is selected from the current population using Eq. 5. Then, depending on whether a randomly generated value is smaller than limit2, Eq. 6 is used to update the solution elements, as follows:

$$ r = round((SN*rand) + 0.5) $$

(5)

$$ x_{i,j}^{t + 1} = x_{r,j}^{t} + 0.5*rand*(x_{worst,j}^{t} - x_{r2,j}^{t} - x_{best,j}^{t}) $$

(6)

where $ x_{i,j}^{t + 1} $ is the jth element of solution $ x_{i} $ at generation t + 1, which represents the location of the solution i,$ x_{best,j}^{t} $ is the jth element of $ x_{best} $ at generation t, which represents the best location among the food sources so far, $ x_{worst,j}^{t} $ is the jth element of $ x_{worst} $ at generation t, which represents the worst location among the food sources so far, and $ x_{r, j}^{t} $ is the jth element of $ x_{r} $ at generation t, which represents the location of the solution r calculated by Eq. 5. The t in Eq. 6 is the current generation number.

On the other hand, if the randomly generated value was bigger than limit2, the solution elements are updated by Eq. 7, where $ x_{i,j}^{t + 1} $ is the jth element of solution $ x_{i} $ at generation t + 1, which represents the location of the solution i, $ x_{best,j}^{t} $ is the jth element of $ x_{best} $ at generation t, which represents the best location among the food sources so far, $ x_{worst,j}^{t} $ is the jth element of $ x_{worst} $ at generation t, which represents the worst location among the food sources so far, while $ x_{r,j}^{t} $ is the jth element of $ x_{r} $ at generation t, which represents the location of the solution r calculated by Eq. 17.

$$ x_{i,j}^{t + 1} = x_{r,j}^{t} + 0.5*rand*\left({x_{best,j}^{t} - x_{r3,j}^{t} - x_{worst,j}^{t}} \right) $$

(7)

The levy flight step from the MBO algorithm is adopted here with a smaller probability of execution to reduce its impact on the exploitation process. Assuming the execution path passed the test of limit1 and limit2 control parameters, yet another random check against the BAR parameter is performed, right after the update by Eq. 7 to further change the value of $ x_{i,j}^{t + 1} $ occasionally by the amount $ \propto \times \left({{\text{dx}}_{\text{k}} - 0.5} \right), $ as per Eq. 3.

Finally, the employee bee adjusting phase tests the boundary for the new solution to make sure the newly generated solution is within the allowed boundaries for the optimization problem at hand. It then evaluates the fitness value of the new solution in order to apply a ‘greedy’ selection process between the new and the best solutions in order to select the better one. If the solution does not improve then a trial counter is increased by one. As for the onlooker-bee and scout phases, the algorithm adopts their implementation from the original ABC algorithm without any change.

2.2 The HAM Adaptation Process

The adapting process is an important step for the optimization of the ANN by using evolutionary and swarm intelligence for the training of ANNS. In all evolutionary-based and swarm-based methods, the training process translates into using suitable MLP weights representation, fitness function and termination condition(s). Consequently, in order to adapt the HAM algorithm as an MLP training method these three issues must be adapted to suit the enforcement of the HAM algorithm, and to satisfy the requirements assigned by the MLP training process in general. This new training method is called the HAMMLP algorithm.

The methodology of using a metaheuristic algorithm for training neural networks is based on three methods. Firstly, algorithms are used to find a combination of weights and biases that provide a minimum of error for any type of neural network, and to reduce the MSE (Mean Square Error) which represents the cost function of the ANN. Secondly, the algorithms are utilized to find a suitable structure for any type of neural network in a particular problem. Lastly, the metaheuristic algorithm is used to tune the parameters of a gradient-based learning algorithm, such as the learning rate and momentum. In the first method the structure of neural network model (MLP) does not change during the learning process. The training algorithm seeks after finding the suitable values for all connection weights and biases for minimizing the overall error of the MLP. While in the second method the architecture of the MLP model varies. The training algorithm determines the best architecture of the MLP model for solving a specific problem. Changing the architecture can be accomplished by manipulating the connections between neurons, the number of hidden layers, and the number of hidden nodes in each layer.

For example, Yu et al. [36] used the PSO algorithm for defining the architecture of the MLP model to solve two real problems. And, Leung et al. used the EA algorithm to tune the parameters of an FNN by applying the last method. There are some studies that employed a combination of methods simultaneously: for instance, Mizuta et al. [37] and Leung et al. [38] used the genetic algorithm and the improved genetic algorithm to define the architecture of neural network model FNN. In this work, the HAM algorithm is applied to an MLP using the first method. To design a HAM trainer for MLPs, the following main stages should be completed:

2.2.1 Representing the Weights and Biases for MLP Using HAMMLP

This section shows the details of the HAMMLP method in order to improve the weights and biases of MLP model; the aim being to reduce overall error. Moreover, any MLP model will rely on the number of hidden layers, the number of nodes in each hidden layer, which in turn relate to weights, while the biases relate to each node in hidden and output layers.

In fact, the methods of representing weights and biases can be one of these three methods: matrix, binary, and vector. This was presented in detail in our previous work [39, 40]. In matrix encoding every solution is encoded as a matrix. To train the MLP, each solution is representative of all the weights and biases. While in the binary representation solutions are encoded as the strings of binary bits. And in vector representation every agent is encoded as a vector. Each of these three methods has its own advantages and disadvantages that can be useful in a particular application [41].

In the vector method the encoding process is much easier than the decoding process. It is often used for simple neural network models. In the matrix method the decoding process is easy but the encoding is difficult for neural network models with complex architectures; it is very appropriate for learning algorithms from generalized neural network toolboxes. But naturally the binary method needs to represent variables in the binary form. In this case the length of each solution will be increased when the architecture starts to become more complex; this leads to the process of decoding and encoding also becoming very intricate.

This study does not use any generic toolboxes because the run time is much less for our hand-coded MLPs. As an example of this encoding process the final vector of the MLP is shown in Fig. 2.

This study proposes a new algorithm (HAMMLP) for training the MLPs. In the HAMMLP solutions are represented by two vectors:

1.
The first vector represents the solution structure which contains the number of inputs, number of hidden layers, and the number of nodes in each hidden layer in the MLP Model.
2.
The second vector does represent the weights and biases of the solution, which corresponds to the weights and biases in the MLP model. For the purpose of this study the structure is fixed before training MLPs. The objective is to reach a training algorithm that excels in finding suitable values for all connection weights and biases that ultimately contribute to minimizing the MLPs overall error.

Each of these two vectors have different representations; the cell of the structure vector contains 0 or 1, while the cell of the vector which represents the weights and biases of the solution contains a real number in the range of [0, 1]. The solution representation in HAMMLP is provided in Fig. 2. The solution structure vector divides into three groups; the first group contains a set of cells representing the number of nodes in the input layer, which is considered a feature of the datasets. The length of the weights and biases solution vector is based on the number of the weights, plus the number of biases in the neural network model; it is deduced by Eq. 8. The number of weights and biases is dependent on the number of hidden layers and the number of nodes in each hidden layer participating in the solution structure; they are calculated by Eqs. 9 and 10.

$$ {\text{Length}}\;{\text{of}}\;{\text{weights}}\;{\text{and}}\;{\text{biases}}\;{\text{solution}}\;{\text{vector}} = W + B $$

(8)

$$ W = \left({I \times N} \right) + \left({\left({N \times N} \right) \times \left({H - 1} \right)} \right) + \left({N \times O} \right) $$

(9)

$$ B = H \times N + O $$

(10)

where W represents the number of weights, B represents the number of biases; I denotes the number of nodes in the input layer, N represents the number of nodes in each hidden layer, H represents the number of hidden layers, and O denotes the number of nodes in the output layer.

In the case of classification (as the feature selection is vital in the classification process) several cells are added in the solution structure for the selection of inputs for the neural network as a feature selection part. Undoubtedly, the number of cells in the features part is equal to the number of the conditional features in the dataset. If one is an element of the features part, then the particular feature from the conditional feature set is contained in the subset. If zero, the subset does not contain the specified feature. All the experiments in this study employ the complete set of features in the dataset.

2.2.2 Adapting the HAMMLP Quality Measure (Fitness Function)

The HAMMLP algorithm can decide on the success of the improvised solution. In fact it uses a HAMMLP quality measure, i.e. fitness function. The HAM algorithm is very similar to other optimization algorithms in that it is translated to maximize or minimize a measure obtained by the above-mentioned fitness function. The goal of such a fitness function should be similar to its functionality in optimization algorithms, besides that it is similar to those training methods in [41, 42]; they reduce the overall error. Thus, such a fitness function could utilize any of the MLPs error calculation formulas, or derive a new fitness function based on these formulas where the goal is to minimize this error.

In this work, MSE is used as the principal quality measure of the proposed HAM training algorithm. The training goal is to minimize the MSE until the maximum number of iterations has been reached.

The MSE is one of the most commonly used fitness functions; it is chosen as the main quality measure for the proposed MLP training algorithm (as this work is one that considers classification problems). The MSE, as the main fitness function, is the measure by which the food source vectors are to be sorted on from best to worst, with the best being the solution with the least MSE value. Thus in order to find an optimal solution, i.e. MLPs with acceptable weights & biases vector, its MSE value must be smaller than the worst one in the current food source memory vectors (FSMV). The fitness function is responsible for evaluating the quality of the solution in successive iterations. With the use of the fitness function a solution is picked that optimizes the quality of the solution.

In order to compute MSE, forward pass calculations must be performed first on the given MLP structure. This is a repetitive process that involves loading the entire training data set. This would require a process by which the network weights & bases (represented by the solution vector) are to be loaded into the MLP structure to implement such a computation. The MLP structure must be therefore flexible to allow loading different weight & bias vectors during the HAMMLP algorithm initialization and improvisation processes-such forward pass computation process is shown in Fig. 3.

The objective in training an MLP is to reach the highest classification, approximation, or prediction accuracy for both training and testing samples. A common metric for the evaluation of an MLP is the mean square error (MSE). In this work we apply the same method in [43,44,45] in order to calculate the fitness function. From Fig. 2(a) we note that MLPs with three layers contain one input, one hidden, and one output layer. The number of input nodes is equal to (n), the number of hidden nodes is equal to (h), and the number of output nodes is (m). The output of the ith hidden node is calculated as follows:

$$ f\left({S_{j}} \right) = 1/\left({1 + exp\left({- \left({\mathop \sum \limits_{i = 1}^{n} {\mathcal{W}}_{ij}.{\mathcal{X}}_{i} - \theta_{j}} \right)} \right)} \right), j = 1, 2, \ldots,h $$

(11)

where $ S_{j} = \mathop \sum \limits_{i = 1}^{n} {\mathcal{W}}_{ij}.{\mathcal{X}}_{i} - \theta_{j} $, n is the number of the input nodes, $ {\mathcal{W}}_{ij} $ is the connection weight from the ith node in the input layer to the jth node in the hidden layer,$ \theta_{j} $ is the bias (threshold) of the jth hidden node, and $ {\mathcal{X}}_{i} $ is the ith input. After calculating outputs of the hidden nodes, the final output can be defined as follows:

$$ {\mathcal{O}}_{k} = \mathop \sum \limits_{i = 1}^{n} {\mathcal{W}}_{kj}.f\left({S_{j}} \right)_{{}} - \theta_{k} , k = 1, 2, \ldots,m, $$

(12)

where $ {\mathcal{W}}_{kj} $ is the connection weight from the jth hidden node to the kth output node and $ \theta_{k} $ is the bias (threshold) of the kth output node. Finally, the learning error $ E $ (fitness function) is calculated as follows:

$$ E_{k} = \mathop \sum \limits_{i = 1}^{m} \left({{\mathcal{O}}_{i}^{k} - d_{i}^{k}} \right)^{2} $$

(13)

$$ {\text{E}}_{{}} = \mathop \sum \limits_{{{\text{k}} = 1}}^{\text{q}} \frac{{{\text{E}}_{\text{k}}}}{\text{q}} $$

(14)

where $ q $ is the number of training samples, $ d_{i}^{k} $ is the desired output of the jth input unit when the kth training sample is used, and $ {\mathcal{O}}_{i}^{k} $ is the actual output of the ith input unit when the kth training sample is used. Therefore, the fitness function of the ith training sample can be defined as follows:

$$ {\text{Fitness}}\;\left({x_{\text{i}}} \right) = {\text{E}}\left({x_{\text{i}}} \right) $$

(15)

2.2.3 Training of MLP with the HAM algorithm

Figure 3 shows the flowchart of a generic HAM based MLP training approach for intrusion detection that utilizes the HAM algorithm introduced above. It illustrates the framework for our proposed HAMMLP-IDS and shows that it can be dissected into four main modules: parameter-initialization stage, data-input stage, neural network-training stage, and the HAM module.

The first stage of our proposal framework is the initialization of the parameters of the HAM algorithm and neural network module. The HAM algorithm has many variables, including population size (SN) representing the number of food sources or solutions in the population (Solution Number). Every solution in SN (i = 1, 2…, N) represents a D-dimensional vector, N is the number of decision variables. The ranges’ lower and upper limits are specified by two vectors x^L and x^U, both having the same length SN. The limit is used to diversify the search, to determine the number of allowable iterations for which each non-improved solution is to be abandoned, and additionally there is three control parameters: limit1, limit2 which are used to adjust the mutation operators in the HAM algorithm and the maximum walk step parameter S_max. The Food Source Memory (FSM) is a matrix of the best solution vectors achieved so far. It is an augmented matrix of size SN × N comprised in each row as in Eq. (16). The FSM size is set prior to the running of the algorithm. Each Source vector is also associated with a source quality value (fitness) based on an objective function f(x). The algorithm in Fig. 1 is similar to the optimization algorithm where it begins by initializing the HAM with random food source memory vectors representing candidate MLP weight vector values.

$$ FSM = \left[{\begin{array}{*{20}c} {\begin{array}{*{20}c} {x_{11}} \\ {x_{21}} \\ \end{array}} & {\begin{array}{*{20}c} {x_{12}} \\ {x_{22}} \\ \end{array}} & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \cdots & {x_{1N}} \\ \end{array}} \\ {\begin{array}{*{20}c} \cdots & {x_{2N}} \\ \end{array}} \\ \end{array}} \\ {\begin{array}{*{20}c} {x_{31}} \\ \cdots \\ \end{array}} & {\begin{array}{*{20}c} {x_{32}} \\ \cdots \\ \end{array}} & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \cdots \\ \ddots \\ \end{array}} & {\begin{array}{*{20}c} {x_{3N}} \\ \cdots \\ \end{array}} \\ \end{array}} \\ {x_{SN1}} & {x_{SN2}} & {\begin{array}{*{20}c} \cdots & {x_{SN N}} \\ \end{array}} \\ \end{array}} \right]\left[{\begin{array}{*{20}c} {\begin{array}{*{20}c} {f(x_{1})} \\ {f(x_{2})} \\ \end{array}} \\ {\begin{array}{*{20}c} {f(x_{3})} \\ \vdots \\ \end{array}} \\ {f(x_{SN})} \\ \end{array}} \right] $$

(16)

On the other hand, the number of neurons in the layers of the neural network is determined by knowing the number of features for each dataset. An example of this: if the construction of a neural network based on the KDD Cup 99 dataset which has 41 of features, the number of neurons in the input layer is equal to 41, which also refer to the number of food sources or solutions vector in the HAM algorithm. The number of neurons in the hidden layer is calculated by using Kolmogorov’s Theorem: one hidden layer and 2 N + 1, N representing the number of neurons in the input layer. The neurons in the output layer are equal to 1 for each IDS dataset used in this work, as all datasets used in our work are represented by binary data [0 or 1].

The second stage is an important one as it uses the data input module. This module is based on processing, filtering, and extracting the features from the raw data. One of the most important steps in this module is to divide the raw data into a training and testing set, which will then be used in the next neuron module as input data. Before we send the data into NN module, we must map the incoming inputs to turn into zero to one [0, 1] to make this data usable for the next module.

In the third stage, the ANN module begins to function after receiving training attributes for the input data from the previous module. This module is designed as a multilayer perceptron (MLP) which is a type of a feed-forward neural network. An MLP consists of three layers of neurons with an architecture containing one input layer, one hidden layer, and one output layer. The ANN module receives the outcoming data from the data input module which are considered as training pattern data (training dataset) for training the ANN. The training process in this module is implemented via sending the weights and biases to the HAM module.

The fourth stage in our proposal framework is using the HAM algorithm as a standalone system (black box) for generating new solutions which in turn are based on updated synaptic weights and biases (after each iteration). In each iteration of the training process the HAM module sends its individual solutions as a set of weights and biases into an ANN module; here the ANN module plays an important role by valuing these individual solutions based on a training data set and then returns their fitness values. The fitness function selected in this work to compute the fitness is the mean squared error (MSE), as it is a known fitness function and therefore good for use on the proposed HAM training algorithm.

The weights and biases are obtained by minimizing the error rate value of the MSE. The training process stops when the iterations reach the maximum number of iterations. Afterward the knowledge base (weights and biases) is updated. In the final step and after finishing training with the training dataset, we get the optimal solution from the HAM module. We use the optimal solution with the testing inputs which are fed from the testing dataset into the trained ANN module to predict the output. The testing process of the ANN can be seen as testing the predicted output with the closest match to any of the target classes.

Integrating various parts of the adaptation process that were presented in the previous section would lead to the HAM-based MLP training algorithm provided as a flow step in Fig. 4. The region of the flowchart enclosed by the dashed rectangle is the food source memory FSM initialization phase that essentially involves generating random weight and bias vectors from the allowable range of [Lower, Upper] and computing the relative MSE values for each by carrying out the forward pass computations.

The forward pass ingredients represented by the shaded side-framed rectangle is the pre-defined process provided previously in Fig. 4. This is also needed during the HAM training process to measure the quality (fitness function) of the newly improvised weight and bias vector.

3 Experimental Framework

The implementation and evaluation of the proposed framework was conducted on a Laptop with Core i5 2.4 GHz CPU and 8 GB RAM, and using a MATLAB R2014a running on a Windows 7. To evaluate the performance of HAMMLP-IDS framework we implemented three experiments, each using a different dataset for the offline evaluation of the IDSs, namely KDD Cup 99, ISCX 2012, and UNSW-NB15, against nine of the metaheuristic algorithms which have been adapted with ANNs; they are similar to our proposed framework.

3.1 Datasets Used for Experiments

There are many sets of data employed in assessing intrusion detection systems. One of the most noticeably antiquated is KDD Cup 99; although it is still used in much research hitherto. There are also a lot of datasets that have emerged in recent years in order to evaluate IDSs, such as the ISCX 2012 which was developed in 2012, and UNSW-NB15 which was developed in 2015.

3.1.1 KDD Cup 99 Dataset

The most popular and widely used dataset regarding the detection of intruders and anomalies is the aforementioned “KDD Cup 1999 Dataset” [46, 47], which was created and developed in 1999 by Lee and Stolfo [48]. It was built on the information obtained from MIT Lincoln Laboratory, under Defence Advanced Research Projects Agency (DARPA ITO) and Air Force Research Laboratory (AFRL/SNHS) sponsorship. It is made out of a set of records that can be approximated at 5 million. It represents TCP/IP packet connection, each packet connection contains 41 attributes (features) out of which 38 are numeric and 3 are symbolic. The NSL-KDD dataset is divided into training and testing sets, and it has four attack classes: DoS, U2R, R2L, and probe [49].

In this work we have used four sets of KDD Cup 99 datasets that were selected randomly by Zainal in 2007 and which have been used by many researchers [50,51,52,53,54,55]. Every single data set houses approximately 4000 records, nearly half of the data (50–55%) belonging to the normal category; the leftovers mere attacks. For training purposes dataset 1 is used, and for the purpose of testing datasets 2, 3, and 4 are utilized. The classes of all the datasets, number of records and the percentage of occurrence of the feature classes are tabulated in Table 1.

Table 1 Distribution statistics of the KDD Cup 99 training and testing datasets

Training a Neural Network for Cyberattack Classification Applications Using Hybridization of an Artificial Bee Colony and Monarch Butterfly Optimization

Abstract

Similar content being viewed by others

A new evolutionary neural networks based on intrusion detection systems using multiverse optimization

Intrusion detection in networks using cuckoo search optimization

A study on intrusion detection using neural networks trained with evolutionary algorithms

Explore related subjects

1 Introduction

2 Materials and Methods

2.1 Hybrid Algorithm Based on Artificial Bee Colony and Monarch Butterfly Optimization

2.2 The HAM Adaptation Process

2.2.1 Representing the Weights and Biases for MLP Using HAMMLP

2.2.2 Adapting the HAMMLP Quality Measure (Fitness Function)

2.2.3 Training of MLP with the HAM algorithm

3 Experimental Framework

3.1 Datasets Used for Experiments

3.1.1 KDD Cup 99 Dataset

3.1.2 ISCX 2012 Dataset

3.1.3 UNSW-NB15 Dataset

3.1.4 Preprocessing Dataset

3.2 Algorithms and Parameters

3.3 Performance Measures for IDS

4 Results and Discussion

4.1 KDD Cup 99 Results

4.2 ISCX 2012 Results

4.3 UNSW-NB15 Results

5 Conclusion and Future Work

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation