Abstract
As a proactive network security protection scheme, network intrusion detection system (NIDS) has become a powerful tool for early warning of computer and communication systems attacks. However, traditional machine learning methods struggle to pay attention to both spatial and temporal features of network traffic simultaneously, resulting in poor detection performance. In this paper, we propose SR-IDS, an Intrusion Detection System based on Self-taught learning and Representation learning, which consists of one-dimensional stacked convolutional autoencoders (1D-SCAE) and bidirectional gated recurrent units (BiGRU). SR-IDS can extract spatial features through 1D-SCAE and abstract temporal features via BiGRU. It uses self-taught learning and representation learning to simultaneously focus on the spatial and temporal characteristics of network traffic, overcoming the challenges of traditional methods in feature extraction. Experiments show that our SR-IDS model can distinguish the network traffic with 98.90% accuracy on the UNSW-NB15 dataset.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Network intrusion detection system (NIDS) monitors all traffic in the network and detects each data packet passing through the web. Many researchers have begun studying intrusion detection techniques to deal with network attacks effectively. In classification problems, machine learning algorithms perform feature extraction to identify malicious behaviors in network traffic [8]. However, the statistical characteristics of traffic have changed considerably in terms of network architectures and applications today. Traditional machine learning methods have been powerless to efficiently and accurately abstract spatial and temporal features of abnormal traffic.
Self-taught learning is a typical machine learning framework for using unlabeled data in supervised classification tasks [22]. The method does not require the assumption that unlabeled data follows the same distribution as labeled data. Besides, representation learning analyzes the characteristic of data that makes it easier to extract helpful information when building predictors [5]. Inspired by the above ideas, we develop a noval network intrusion detection system based on self-taught learning and representation learning.
General traffic features can be divided into two categories: spatial features, such as data packet features, and temporal features, such as network flow features. NIDS often struggles to broaden the horizon and jump out of the local optimum solution when using only spatial or temporal features [27]. In this paper, we design one-dimensional stacked convolutional autoencoders (1D-SCAE), an excellent self-taught learning model which abstracts spatial features by reducing the dimensionality of complex data signals. Besides, bidirectional gated recurrent units (BiGRU) can extract temporal features of traffic sequences in representation learning. Therefore, we propose a deep neural network model based on 1D-SCAE and BiGRU, which can accurately extract spatial and temporal features and enhance the performance of malicious traffic detection. The main contributions of the proposed work include the following:
-
We design 1D-SCAE—an improved network traffic spatial feature extraction model, which uses sparse regularization to reduce overfitting by invalidating a certain part of active neurons. The greedy layer-wise strategy is adopted to achieve the best detection performance.
-
We propose a BiGRU-based temporal feature extraction model that utilizes TimeseriesGenerator to generate and model traffic time series. It can acquire both memories from history and information from the future.
-
We develop SR-IDS, a network intrusion detection system that simultaneously focuses on network traffic’s spatial and temporal characteristics. Experiments show that the accuracy of SR-IDS on the UNSW-NB15 dataset can reach 98.90\(\%\).
-
We discuss different hyperparameters to determine the optimal model architecture. Furthermore, we compare the detection performance of different RNN variants.
The rest of the paper is organized as follows. The related work on NIDS is reviewed in Sect. 2. Then we present the details of the proposed SR-IDS in Sect. 3. The accuracy and the efficiency of SR-IDS are verified in Sect. 4 by comparing it with several state-of-the-art IDS algorithms. Finally, we provide our conclusions and discuss the future work in Sect. 5.
2 Related Work
NIDS is a necessary foundation and premise for dealing with complex network attacks and identifying malicious traffic behavior. The deep learning models currently applied to network anomaly detection include two categories: generative intrusion detection model and discriminative intrusion detection model.
2.1 Generative Intrusion Detection Model
Generative models often adopt an advanced hierarchical learning method to establish a multi-level model, which can flexibly analyze and restore joint probability distribution. The current famous generative model architecture mainly includes autoencoder and its variants [18].
Amir et al. [4] designed a new lightweight architecture that considers feature separation and uses surrounding information of a single value in the feature vector. The accuracy is improved while reducing the memory footprint and the need for processing power. Iliyasu et al. [12] achieved a few-shot learning intrusion detection, which uses the feature extraction model in the few-shot learning stage to fit a classifier with a small number of novel attack samples. Long et al. [17] proposed a network intrusion detection model based on an integrated autoencoder. It uses recursive feature addition to select the optimal subset of features, which can significantly reduce the training time and improve the intrusion detection performance.
2.2 Discriminative Intrusion Detection Model
Discriminative models are usually based on the excellent classification of heterogeneous data to achieve the best recognition. The common discriminative model structures mainly include recurrent neural networks and convolutional neural networks [2].
Imrana et al. [14] proposed a novel feature-driven intrusion detection system. The model first utilizes a statistical model to rank all the features, then uses best-first-search algorithm to search for the best subset, and finally classifies testing data based on the best subset. Sahu et al. [23] proposed a multi-classification intrusion detection method based on LSTM and fully connected networks. This method accurately classifies the imbalanced intrusion data. Imrana et al. [13] used an improved RNN model for network intrusion detection, which can be associated with the feature knowledge and accurately classify unknown data.
Several works sought to propose ML-based solutions with consideration of as many essential features as possible, and the approaches managed to obtain interesting results. However, there are still some challenges in extracting both spatial and temporal traffic features. Inspired by existing research progress, we propose SR-IDS—a new intrusion detection system with the advantages of generative models and discriminative models. Moreover, it can serially extract the spatial and temporal features of network traffic accurately.
3 The Proposed Model
In this section, we introduce how SR-IDS works. SR-IDS first preprocesses the UNSW-NB15 dataset, including one-hot encoding and normalization. Afterward, SR-IDS uses 1D-SCAE to extract spatial features of network traffic, and the greedy layer-wise strategy is adopted to pre-train the neural network. Finally, SR-IDS uses BiGRU to extract temporal features of network traffic. BiGRU accepts input from pre-trained 1D-SCAE and outputs to the binary classifier. Figure 1 describes the framework of our proposed SR-IDS model.
3.1 Data Preprocessing
In general, machine learning models can only process meaningful numerical data, but the actual data differs from what we expected. In order to enable machine learning models to process and analyze traffic data, assigning numerical meaning to features is necessary. One-hot encoding is a commonly used feature encoding method.
One-hot encoding expresses a specific type of different values in binary vectors. The N values used for encoding correspond to the states of N registers one by one. Only one bit in any form is activated, and the rest of the registers are inactive. The specific representation is generally \(v_i=\{0,1,0,\dots 0,0\}\), and the dimension of the vector is equal to the number of possible values N of the eigenvalues to be encoded.
After encoding, we use the min-max method to standardize network traffic samples. With a fixed output range, the min-max method performs a linear operation on the sequence \(\{x_1,x_2,\dots ,x_n\}\). After transformation, the new sequence \(\{y_1,y_2,\dots ,y_n\}\in ( 0,1)\) are dimensionless:
It can be found that min-max forces the original input data to distribute in [0, 1], and the normalized scale transformation is only related to extreme values.
3.2 Spatial Feature Extraction
Spatial features of network traffic refer to feature sets related to packets, for example, packet size and number. We design a 1D-SCAE for spontaneously learning spatial feature representation, and Fig. 2 describes the architecture. In each layer, the autoencoder convolves the features of the lower layers to produce a high-level representation. The whole methodology is shown as follows:
where \(M_j\) represents the input feature map, l represents the l-th layer in 1D-SCAE, and k is the convolution kernel. f represents the activation function, and \(b_j^l\) is the bias vector.
The 1D-SCAE consists of three convolutional autoencoders, and their encoder layers are stacked in the model construction process to build the complete 1D-SCAE model. After the training is completed, we discard the decoders and connect the last encoder layer to the subsequent temporal extraction model, which will be explained in the next subsection. MSE Loss is used to evaluate the effect of feature extraction and input reconstruction as follows:
where i is the sample index, \(x_i\) is the original input data, and \(x'_i\) is the reconstructed data after dimensionality reduction by 1D-SCAE.
We also add a custom regularization term in 1D-SCAE to improve the generalization performance of the model. The principle is that different inputs cause different neurons to be activated, making neurons better dependent on data. In general, the constant \(\rho \) is the proportion of activated neurons, which is used to measure the average activity \(\hat{\rho }\) of the activation degree of neurons:
where N is the number of neurons in the hidden layer, \(\varTheta \) is the corresponding neuron transformation. In the field of machine learning, forward KL divergence is often used as the training cost to measure the difference between two probability distributions. Forward KL divergence makes sure \(\hat{\rho }\) close to \(\rho \), and the regularization term punishs the deviation between \(\hat{\rho }\) and \(\rho \):
If \(\hat{\rho }\) is equal to \(\rho \), the KL divergence is 0; otherwise, it will gradually increase as the difference between \(\rho \) and \(\hat{\rho }\) increases. Therefore, the error function \(J^{'}\) in the sparse autoencoder is shown as follows:
where J is the error when no sparse item is added, and \(\mu \) is the impact factor used to balance the weight of KL divergence in the entire loss function.
3.3 Temporal Feature Extraction
In this work, we group traffic records by timestep and link the context with their labels. Our proposed SR-IDS can accurately reflect the time characteristics of network traffic and significantly reduce the false positive rate.
SR-IDS takes the output from the spatial feature extraction model as input and uses TimeseriesGenerator—a time series generator to convert isolated samples into a sequence. After serialization, the processed traffic is input into the BiGRU. The principle of BiGRU is to split the neurons of a regular GRU into two directions, one for positive time direction and another for negative time direction.
Assume that the current input vector is \(x_t\), the last step activation vector is \(r_{t-1}\), W and U are weight matrices used to represent the connection strength between neurons, and b is the bias vector. \(\sigma _g\) represents the sigmoid activation function, the update gate vector \(z_t\) and the reset gate vector \(r_t\) are shown as follows:
The candidate activation vector \(\hat{h}_t\) is obtained through the Hadamard product of \(r_t\) and \(h_{t-1}\), where \(\phi _h\) represents the hyperbolic tangent function:
Finally, update the activation output vector of the hidden unit \(h_t\) at time t:
When the 1D-SCAE model is completely trained, we connect it with the subsequent BiGRU network, as shown in Fig. 3. We optimize the free parameters in BiGRU to achieve the global optimum. The binary cross entropy loss function in the final binary classification is adopted to evaluate the model as follows:
where i is the sample index, N is the number of samples, \(y_i\) is the binary label of the i-th sample, and \(p(y_i)\) is the probability that the output belongs to the \(y_i\) label. For the case where the label \(y_i\) is 1, if the predicted value \(p(y_i)\) approaches 1, then the loss approaches 0. Conversely, if the predicted value \(p(y_i)\) approaches 0, the loss should be tremendous.
4 Experiments
In this section, a series of experiments are conducted to verify the efficiency and accuracy of the proposed SR-IDS. Specifically, we first present the experimental settings and some details. Then we analyze some critical parameters to find the optimal solution. Lastly, we evaluate SR-IDS’s performance and compare it with some state-of-the-art methods.
4.1 Dataset
The UNSW-NB15 dataset simulates a modern representation of network traffic [19]. Each instance in the dataset is a network flow that summarizes the activity of a sequence of unidirectional packets with contextual features. Additional features are introduced into the dataset, totaling 49 features.
4.2 Model Hierarchy
SR-IDS inputs the preprocessed data into three independent one-dimensional convolutional autoencoders and trains them separately through a greedy layer-wise strategy. Three autoencoders’ encoder layers are stacked after training by the weight-sharing method and then connects to the time series generator to produce traffic groups with contextual features. Afterwards, we use BiGRU to extract temporal feature and output the type judgment of the testing set. The complete model hierarchy and some significant parameters are shown in Table 1.
4.3 Parameters Analysis
We compare and test the influence of different learning rate and dropout ratio on convolutional layer and dense layer, as shown in Table 2. We find that the convergence speed of the entire neural network is extremely slow when the initial learning rate of the dense layer is less than 0.0001. It means the time overhead significantly increases, and the effect improvement is negligible, so we do not adopt the lower initial learning rate scheme.
We also compare the loss and accuracy of different RNN models during training iterations. Figure 4 shows the detailed performance of training loss and accuracy for attack detection. It can be seen that loss and accuracy hardly change when the epoch reaches 50, and BiGRU can achieve better performance than the other three RNN variants.
4.4 Evaluation
We compare the proposed method’s performance with some state-of-the-art methods, as shown in Table 3. Additionally, we test our model on KDD CUP 99 [26] and CIC-IDS-2017 dataset [24], which also shows well performance. In summary, our proposed SR-IDS method can achieve excellent performance in network traffic anomaly detection.
5 Conclusion
In this paper, we propose SR-IDS, an intrusion detection system for network traffic based on self-taught learning and representation learning, which simultaneously focuses on traffic’s spatial and temporal characteristics. Specifically, it utilizes 1D-SCAE to extract spatial features and BiGRU to extract temporal features. The greedy layer-wise strategy is adopted in the training process of 1D-SCAE, and sparse regularization is applied to reduce overfitting. BiGRU generates time series through TimeseriesGenerator to extract advanced time features. Multiple experiments have proved that BiGRU can achieve the best score among RNN variants. The accuracy rate of our proposed SR-IDS model in classifying network traffic on UNSW-NB15 dataset can reach 98.90%, which is more efficient than other current IDS methods.
In future research, we can consider online operations to improve robustness and stability. Furthermore, defense against attack techniques targeting deep learning models is also a research direction in the future.
References
Ahmad, R., Li, L.H., Sharma, A.K., Tanone, R.: Boundary-seeking GAN approach to improve classification of intrusion detection systems based on machine learning model. In: 2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp. 1–5. IEEE (2023)
Alghanam, O.A., Almobaideen, W., Saadeh, M., Adwan, O.: An improved PIO feature selection algorithm for IoT network intrusion detection system based on ensemble learning. Expert Syst. Appl. 213, 118745 (2023)
Anwer, H.M., Farouk, M., Abdel-Hamid, A.: A framework for efficient network anomaly intrusion detection with features selection. In: 2018 9th International Conference on Information and Communication Systems (ICICS), pp. 157–162. IEEE (2018)
Basati, A., Faghih, M.M.: PDAE: efficient network intrusion detection in IoT using parallel deep auto-encoders. Inf. Sci. 598, 57–74 (2022)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Cao, B., Li, C., Song, Y., Qin, Y., Chen, C.: Network intrusion detection model based on CNN and GRU. Appl. Sci. 12(9), 4184 (2022)
Ding, H., Chen, L., Dong, L., Fu, Z., Cui, X.: Imbalanced data classification: a KNN and generative adversarial networks-based hybrid approach for intrusion detection. Futur. Gener. Comput. Syst. 131, 240–254 (2022)
Du, R., Li, Y., Liang, X., Tian, J.: Support vector machine intrusion detection scheme based on cloud-fog collaboration. Mob. Netw. Appl. 27(1), 431–440 (2022)
Faker, O., Dogdu, E.: Intrusion detection using big data and deep learning techniques. In: Proceedings of the 2019 ACM Southeast Conference, pp. 86–93 (2019)
Ferrag, M.A., Maglaras, L., Ahmim, A., Derdour, M., Janicke, H.: RDTIDS: rules and decision tree-based intrusion detection system for internet-of-things networks. Future Internet 12(3), 44 (2020)
Gu, J., Lu, S.: An effective intrusion detection approach using SVM with naïve bayes feature embedding. Comput. Secur. 103, 102158 (2021)
Iliyasu, A.S., Abdurrahman, U.A., Zheng, L.: Few-shot network intrusion detection using discriminative representation learning with supervised autoencoder. Appl. Sci. 12(5), 2351 (2022)
Imrana, Y., Xiang, Y., Ali, L., Abdul-Rauf, Z.: A bidirectional LSTM deep learning approach for intrusion detection. Expert Syst. Appl. 185, 115524 (2021)
Imrana, Y., et al.: \(\chi \) 2-BidLSTM: a feature driven intrusion detection system based on \(\chi \) 2 statistical model and bidirectional LSTM. Sensors 22(5), 2018 (2022)
Le Jeune, L., Goedemé, T., Mentens, N.: Feature dimensionality in CNN acceleration for high-throughput network intrusion detection. In: 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), pp. 366–374. IEEE (2022)
Lee, J., Park, K.: AE-CGAN model based high performance network intrusion detection system. Appl. Sci. 9(20), 4221 (2019)
Long, C., Xiao, J., Wei, J., Zhao, J., Wan, W., Du, G.: Autoencoder ensembles for network intrusion detection. In: 2022 24th International Conference on Advanced Communication Technology (ICACT), pp. 323–333. IEEE (2022)
Louk, M.H.L., Tama, B.A.: Dual-IDS: a bagging-based gradient boosting decision tree model for network anomaly intrusion detection system. Expert Syst. Appl. 213, 119030 (2023)
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 military communications and information systems conference (MilCIS), pp. 1–6. IEEE (2015)
Nancy, P., Muthurajkumar, S., Ganapathy, S., Santhosh Kumar, S., Selvi, M., Arputharaj, K.: Intrusion detection using dynamic feature selection and fuzzy temporal decision tree classification for wireless sensor networks. IET Commun. 14(5), 888–895 (2020)
Nerlikar, P., Pandey, S., Sharma, S., Bagade, S.: Analysis of intrusion detection using machine learning techniques. Int. J. Comput. Netw. Commun. Secur. 8(10), 84–93 (2020)
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766 (2007)
Sahu, S.K., Mohapatra, D.P., Rout, J.K., Sahoo, K.S., Pham, Q.V., Dao, N.N.: A LSTM-FCNN based multi-class intrusion detection using scalable framework. Comput. Electr. Eng. 99, 107720 (2022)
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1, 108–116 (2018)
Tama, B.A., Rhee, K.H.: An in-depth experimental study of anomaly detection using gradient boosted machine. Neural Comput. Appl. 31, 955–965 (2019)
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. IEEE (2009)
Thakkar, A., Lohiya, R.: A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif. Intell. Rev. 55(1), 453–563 (2022)
Tian, Q., Li, J., Liu, H.: A method for guaranteeing wireless communication based on a combination of deep and shallow learning. IEEE Access 7, 38688–38695 (2019)
Yong, L., Bo, Z.: An intrusion detection model based on multi-scale CNN. In: 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), pp. 214–218. IEEE (2019)
Zhou, Y., Han, M., Liu, L., He, J.S., Wang, Y.: Deep learning approach for cyberattack detection. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 262–267. IEEE (2018)
Acknowledgements
This work is supported by the National Key Research and Development Program of China (No. 2020YFB1805403) and the National Natural Science Foundation of China (61876134).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q., Yang, G., Wang, L., Fu, J., Liu, X. (2023). SR-IDS: A Novel Network Intrusion Detection System Based on Self-taught Learning and Representation Learning. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14256. Springer, Cham. https://doi.org/10.1007/978-3-031-44213-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-44213-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44212-4
Online ISBN: 978-3-031-44213-1
eBook Packages: Computer ScienceComputer Science (R0)