Keywords

1 Introduction

Network intrusion detection is a new network security mechanism designed to detect, prevent and repel unauthorized access to a communication or computer network. An intrusion detection system (IDS) plays a crucial role in maintaining a safe and secure network. In recent years, a huge network data is generated due to the application of new network technologies and equipment, which leads to the declining of the defect rates. The intrusion detection process is a difficult and complicated one in terms of detection accuracy, detection speed, the dynamic nature of the networks and the available processing power for processing high volumes of data from distrusted network systems [15]. Recently, many researchers proposed innovative approaches in recent years.

These methods, based on detecting in team of behavior-based and resource type of access, are divided into four categories. The first category is to detect anomalies based on statistical analysis, such as, Bayesian model [3], Decision Tree. Anomaly-based techniques build models of normal network samples and detect the samples that deviate from these models in literature [7]. It can detect new types of attacks via already known normal events. Therefore the anomaly detection approach suffers from a high rate of failure. The second category is anomaly detection approach where most methods require a set of standard normal dataset to train the classifier and check whether new sample fits the model. These principle methods are coined as outlier detection algorithm, such as k-mean, self-organizing maps and unsupervised support vector machines approaches [8]. The third category employing AI techniques to detect attack types by taking advantage of machine learning can prioritize solutions to certain problem, such as, SVM [5], RF [23], genetic algorithm (GA) and artificial neural networks etc. The last category is the hybrid and ensemble detecting methods that integrated advantages of different or same methods in order to incase accuracy of detection. These approaches include bagging, adaboost [19] technology, and the PSO-K-means ensemble approach [16]. The PSO-k-means methods could achieve optimal numbers of clusters and increase the high detection rate which utilized K-means technology to detect attack types in networks. In addition, the SVM-KNN-PSO ensemble method proposed by [1] can obtain the best results, which used advantage of nonlinear processing capability and classification capability based distance for each sample. However, their work is based on binary classification methods, which can distinguish between the two states. Alom et al. [2] combines the deep belief network (DBN) and SVM model, the proposed model utilized DBN to select the feature and SVM to capture the rules from attack process, then the reduction dimension output data by DBN regarded as the input dataset fed SVM into detection intrusion. In above methods, it is supposed that each feature of datasets is independent in all time, but in real world, the feature of intrusion dataset is complex and needed a comprehensive analysis.

Taking above discussions into consideration, this paper proposes the KDSVM model using the k-means algorithm to capture the feature of raw data and divide dataset into different subsets. Then each subset is fed to the improved DNN which top layer instead of SVM model, respectively, and learning different characteristics of the sub dataset. Next, these tested datasets are divided by prior cluster center of training dataset into sub testing datasets. Finally these testing sub datasets are fed to the trained DNN for intrusion detection. Because the DNN can acquire enough information, via prior learning processing and capture more specific rules of attack types in networks based on extracting feature capability for massive and complex data [6, 13]. The DNN model based on theory of deep learning, can solve non-linear problems with complex and large-scale data, and has been successfully applied in the area of weather forecasting and stock prediction [10]. The experimental results based on the knowledge of KDD CUP99 datasets and NLS-KDD datasets [22] show that KDSVM generates better accuracy and more robust than other well-known algorithms, and well supported for parallel computing.

The rest of the paper is organized as follows. The related literature concerning of IDS is reviewed in the Sect. 2. Section 3 presents the proposed approach in detailed and describes it works. Section 4 describes the experimental datasets and illustrates the data preparation, evaluation criteria, results and discussions of experiments. Finally, the conclusions and suggestions for future work are provided in Sect. 5.

2 Literature Review

In this section, the deep learning approach of deep neural network is briefly introduced. As a matter of fact, IDS as classification method is very important for deal with feature of dataset, because the categories learner has acquired knowledge and patterns based on the characteristic of data. Additionally, the level of feature representation is determining the performance of a learner.

2.1 DNN Algorithm

The essence of the deep neural network is to learn more useful feature of machine learning and construct multiple layers in network and vast amounts of training data.

Auto-encoder: An auto-encoder is one type of unsupervised neural networks with three layers [12] and the output target of the auto-encoder is input data. The encoder network transforms the input data from a higher dimensional space to codes in a low dimensional space and the decoder network remodels the inputs from the previous works.

The encoder network is defined as an encoding function denoted by \( {\text{f}}_{encoder} \). This function indicates the encoding process:

$$ h^{m} = f_{encoder} (x^{m} ) $$
(1)

In which \( x^{m} \) stands for data point from a dataset, \( h^{m} \) is the encoding vector obtained from \( x^{m} \).

Decoder: The decoder network is defined as a reconstruction function denoted as \( {\text{f}}_{decoder} \), this function indicates the decoding process:

$$ \hat{x}^{m} = f_{decoder} (h^{m} ) $$
(2)

In which \( \hat{x}^{m} \) is the decoding vector obtained from \( h^{m} \). There are specific algorithms for several encoding functions and reconstruction functions including:

$$ {\text{Logsig:}}\quad f_{encoder} \left( {x^{m} } \right) = \frac{1}{{1 + e^{{ - x^{m} }} }} $$
(3)
$$ {\text{Satline:}}\quad f_{encoder} \left( {x^{m} } \right) = \left\{ {\begin{array}{*{20}c} 0 & {if} & {\,x^{m} \le 0} \\ z & {if} & {0 < x^{m} < 1} \\ 1 & {if} & {x^{m} \le 0} \\ \end{array} } \right. $$
(4)
$$ {\text{Pureline:}}\quad f_{encoder} (x^{m} ) = x^{m} $$
(5)

Pre-training: The process is proceeding in the sequence until the Nth auto-encoder is trained for initialization the final hidden layer of the DNN. In this way, all the hidden layers of DNN are stored auto-encoder by stacked structure in each training N times, and are regarded as pre-trained. This pre-training process is proven to be significantly better than random initialization of the DNN and conducive to achieving generalization in classification [9, 11].

Fine-tuning: Fine-tuning is the process that utilizes the supervised fashion to improve the performance of DNN. The network is retraining and labeled from training data, and the errors by difference between real and predicted values are back propagation with stochastic gradient descent (SGD) method for all multilayer network. The equation of SGD is defined as follows:

$$ E = \frac{1}{2}\sum\limits_{i = 1}^{n} {\left( {y_{i} - t_{i} } \right)^{2} } $$
(6)

where, the function E is loss function, y is the real label and t is the output of network. The gradient of weight parameter \( \upomega \) is obtained by derivative the error equation.

$$ \frac{{\partial {\text{y}}}}{{\partial\upomega_{ij} }} = \frac{\partial E}{{\partial y_{j} }} \cdot \frac{{\partial y_{j} }}{{\partial \mu_{j} }} \cdot \frac{{\partial \mu_{j} }}{{\partial \omega_{ij} }} $$
(7)

With the gradient of the \( \upomega_{ij} \) the equation of updated SGD is defined as:

$$ \upomega_{ij}^{new} =\upomega_{ij}^{old} -\upeta \cdot (y_{j} - t_{j} ) \cdot y_{j} (1 - y_{j} ) \cdot h_{i} $$
(8)

In which, the ƞ is the step size and greater zero, h is the hidden layer number in the deep network [4].

This process is tuned and optimized by the weight and threshold based on the real label data in the DNNs, in this way, the deep networks can learn important knowledge for final output and direct the parameter of whole network to detect correct classification [20].

3 Proposed Approach of KDSVM

This section, the proposed approach is used based on clustering methods and deep learning with SVM model to solve above problems. In the first place, the sub training datasets divided the training process into different subsets and calculate center points by each train points. Second, the sub train datasets are trained by kth DNNs, the number k is the value of clusters, this take DNNs that have learned different characteristic of each cluster centers. Third, the sub testing datasets are divided from the test datasets by k-means algorithm that uses the previous cluster centers in the first step, and these sub testing datasets are applied to detect intrusion attack type by completely trained per DNN which top layer used SVM classifier. Finally, the outputs of every DNN are aggregated for the final results of intrusion detection classifiers.

3.1 The KDSVM Algorithm

The approach in detail is showed the algorithm of KDSVM. The point center and training sets are generated by output of k-means function in line 1, the sub testing sets are obtained by calculating the distance with Huffman function in line 2–6, the kth DNNs is trained by training set in line 7–12, the sub testing sets are index and the final results are predicted by the aggregation in line 13–19.

4 Experiments

The experiments will be examined and compared with other detection engineer models, for instance, SVM, BPNN, DBN-SVM and naive Bayes. The six datasets from the KDDCUP99 and NSL-KDD are used to evaluate the performance of all models. Then, the parameters of the number of clusters and the weights of DNN are discussed and analyzed.

4.1 The Dataset

In this research, six datasets are randomly generated from two datasets, KDD CUP’99 and NSL-KDD, which reduce the amount of data, and called Dataset1 to Dataset6, respectively [18] and show in Table 1.

Table 1 The distribution of training set and testing set are shown in six dataset from the KDD‘99 and NSL-KDD

The six new datasets are used to evaluate the performance of KDSVM algorithm, and execute to compare the other detection engineering methods, such as SVM, BPNN, DNB-SVM, and Naive Bayes [14].

4.2 Evaluation Methods

In this study, the Accuracy, Recall, and Error Rate (ER) are used to evaluate the performance of the detection models. The formulas of above criteria are calculated as follows [17]:

$$ {\text{Accuracy}} = \frac{TP + TN}{TP + TN + FP + FN} $$
(9)
$$ {\text{Recall}} = \frac{TP}{TP + FN} $$
(10)
$$ {\text{Error}}\,{\text{Rate}} = \frac{FP + FN}{FP + TP + TN + FN} $$
(11)

In which, True Positives (TP) indicates the number of network attack types distinguishing correct cases, the True Negatives (TN) shows the number of normal network type classifying the correct normal type, the False Negative (FN) is denotes the number of classified attack type detection as normal type, the False Positive (FP) means that the number of classified normal type as attack cases. The step of Accuracy shows the degree of whole correct detection accuracy of dataset and the ER refers to robust of classifier, the Recall indicates the degree of correctly detection attack type in whole attack recodes. In above team, higher accuracy and recall and lower ER is represented good performance.

4.3 Experiments with KDSVM

In this section, cluster number of k is evaluation of KDSVM based on the six dataset, because the area of value of k are different in each dataset and this are serious impact precision of results for KDSVM method. Next, the testing datasets are used to compare the performance of the five models.

Results and Comparisons

In this section, the fusion matrix and the evaluated criterion are calculated with the KDSVM and other four traditional detection engineers in the six datasets respectively. The experiment results of above algorithm in six datasets are shown in Table 2 and Fig. 1.

Table 2 The comparing the results for the intrusion network for six datasets (\%)
Fig. 1
figure 1

The prediction accuracies histogram of five types for models of SVM, BPNN, DBN-SVM, Bayes and KDSVM are compared in six datasets in different colors

In which, the columns symbol of ACC in table heads mean the average accuracy for each models. The records are unbalance in six dataset, the types of Normal and Dos have major compositions, the U2R and R2L have sparse distribution, because the last two cases have especially intrusion actions which have obtained advanced user right, it is more covert intrusion for difficultly detection.

From Table 2 and Fig. 1, consideration accuracy, the KDSVM has better accuracy than other four methods, and has the lowest error rates in the datasets.

4.4 Discussion

The overall accuracy is used to generate the histogram and compare distinguished results in six datasets and shown in Fig. 2. This is more detailed to evaluate the classification performance with five types (one normal and four types).

Fig. 2
figure 2

The histograms of the average precision by the five models are compared with the six datasets

From the above, the results show that KDSVM algorithm is good at detection cases of Normal, Dos and Probe in the six dataset. Therefore, for sparse and difficult cases of U2R and R2L in six datasets, the KDSVM model also obtains higher accuracy.

5 Conclusion

The attacking events of low frequent are usually difficult to predict and it can cause severe threats to networks. This paper puts forward the innovative approach which takes the advantage of k-means and hybrid deep neural network with top layer used SVM classifier, to detect attack types. In the first stage, the features of the network dataset are clustering and divided into k sub datasets in a bid to find more knowledge and patterns from the similar clusters. Then in the second stage, the highly abstract information is obtained by deep learning networks from the subsets during the clustering process. Finally, the DNNs which used SVM classifier to instead of softmax layer, are used to detect the attack cases with testing subsets. This is an efficient way to improve the accuracy of the detection rates. The results of experiment show that the KDSVM performs better than the SVM, BPNN, DBN-SVM and Bayes with best accuracy over the six datasets. On the other hand, the proposed algorithm is more capable of classifying term of sparse attack cases and effectively improves detection accuracy in real security system. However, limitations of the KDSVM include the DNN parameters of weights and threshold of the every layer, and the SVM parameters that need to be optimized by heuristic algorithms, and it will be study works in the further.