1 Introduction

In recent decades, malicious behavior by some Internet users has prompted researchers to work on various intrusion detection techniques. Based on the source of information, two kinds of Intrusion Detection Systems (IDS) have been proposed: host-based and network-based [1]. Host-based IDS is served on host computer and network-based IDS monitors data exchanged between computers. On the other hand, if analyzing of events is considered, two kinds of IDS exist: anomaly-based [2] and misuse-based [3]. Anomaly-based IDS detects activities that differ from established patterns for users, and misuse-based IDS compares users’ activities with the known behaviors of attackers.

Many soft computing approaches have been applied to the intrusion detection field. In this way, the anomaly-based detection techniques can be classified into three main categories: statistical-based [4], knowledge-based [5], and machine learning (e.g., Bayesian networks [6], Markov models [7], Artificial Neural Networks (ANNs) [8, 9], fuzzy logic [10, 11], genetic algorithms [12] and clustering and outlier detection [13]).

The detection techniques that are used in misuse-based IDS can also be classified into three similar categories: statistical-based [14], knowledge-based [15, 16], and machine learning (e.g., Bayesian networks [17], ANNs [1822], fuzzy logic [10], genetic algorithms [23], clustering [24], decision trees [25, 26], and hybrid systems [2730]).

In this paper, a reduced-size structure of Recurrent Neural Network (RNN), based on the grouping of features, is used for misuse detection. Due to size reduction in RNN, training speed and convergence are improved. Thus, a fast IDS is reached which is effective in terms of Detection Rate (DR) and Cost Per Example (CPE).

International Knowledge Discovery and Data mining group (KDD) data set [31] is used for training and test of the proposed model in this study. Each connection in KDD is characterized by 41 features and a label which specifies the status of connection records (normal or a specific attack type). These features are used as the inputs of RNN and grouped into four categories: basic features (B-F), content features (C-F), time-based traffic features (TT-F), and host-based traffic features (HT-F). The RNN has five outputs, one of which indicates normal class (no attack). The other four outputs of RNN represent the type of detected attacks: Denial-of-Service (DoS), Probe, Remote-to-Local (R2L), and User-to-Root (U2R). To reduce the size and computational complexity of RNN-based IDS, the nodes of layers are partially connected based on the mentioned four feature categories.

Experimental results show that the proposed model is able to improve classification rate, particularly in R2L attacks. This method also offers better DR and CPE when compared to similar related works and also the simulated Multi-Layer Perceptron (MLP) and Elman-based IDS with the same number of hidden layer nodes. On the other hand, False Alarm Rate (FAR) of the proposed model is not degraded significantly when compared to some recent machine learning methods.

The remainder of this paper is organized as follows. Section 2 provides the KDD data set details. The architecture of the proposed model is introduced in Sect. 3. Simulations and experimental results are reported in Sect. 4. Conclusions are discussed in Sect. 5.

2 KDD intrusion data

In 1999, recorded network traffic from the Defence Advanced Research Project Agency (DARPA) data set was summarized into network connections with 41 features per connection [31]. This data set formed the benchmark provided by KDD. There are four main categories of attacks given in the KDD: DoS, Probe, R2L, and U2R. The KDD data set consists of three components: “10% KDD”, “Corrected KDD”, and “Whole KDD” [31] (Table 1).

Table 1 Number of samples in KDD data set

As is common in literature, the analysis in this paper is performed on the “10% KDD” data set [21]. Each connection in KDD is characterized by 41 features (listed in Table 2). As mentioned earlier, these features are grouped into four categories: basic features, content features, time-based traffic features, and host-based traffic features.

Table 2 Description and category of 41 features in KDD data set

Basic features can be derived from packet headers without inspecting the payload. In the content features, domain knowledge is used to assess the payload of the original Transmission Control Protocol (TCP) packets. Time-based traffic features are designed to capture properties that mature over a two-second temporal window. Host-based traffic features utilize a historical window estimated over the number of connections, instead of time. Therefore, they are designed to assess attacks that span in intervals longer than 2 s.

3 The proposed model

As mentioned before, a partially connected RNN with two hidden layers is used as misuse-based IDS in this work (Fig. 1). The categorized features defined in Sect. 2 are used as the inputs of RNN. As shown in Fig. 1, the connections between 41 input nodes and first hidden layer nodes are based on the categorization of features. The connections between the nodes of two hidden layers are considered partial. The RNN has five output neurons (representing the normal class and four attack types).

Fig. 1
figure 1

Partially connected RNN-based IDS

The features in the KDD data sets have different forms (discrete, continuous, and symbolic) with significantly varying resolution and ranges. Most pattern classification methods are not able to process data in such a format. Therefore, some preprocessing is required.

Symbolic-valued features, such as protocol_type (with 3 different symbols), service (with 70 different symbols), and flag (with 11 different symbols) are mapped to integer values ranging from 0 to N−1, where N is the number of symbols. Continuous features having smaller integer value ranges like wrong_fragment [0,3], urgent [0,14], hot [0,101], num_failed_logins [0,5], num_compromised [0,9], num_root [0,7468], num_file_creations [0,100], num_shells [0,5], num_access files [0,9], count [0,511], srv_count [0,511], dst_host_count [0,255], dst_host_srv_count [0,255] are also scaled linearly to the [0,1] range.

For the three features that span over a very large integer range, logarithmic scaling (base 10) is applied. The mentioned features are duration [0,58329], src_bytes [0,1.3 billion], and dst_bytes [0,1.3 billion], the spans of which have been reduced to [0,4.77] and [0,9.11], respectively. Other features are either Boolean, like logged_in, or continuous, like diff_srv_rate, in the range of [0,1]. No scaling is needed for these features. So, each of the mapped features are linearly scaled to the [0,1] range.

4 Experimental results

In this work, 49,402 records from “10% KDD” data set and 31,104 records from “Corrected KDD” data set are used as training and test sets, respectively (Table 3). Except for U2R test samples, the remaining sets have the same distribution, as different categories of attacks corresponding to KDD data sets.

Table 3 Size of the training and test sets

The standard metrics that have been developed for evaluating IDS are DR and FAR as the two most common metrics. DR is computed as the ratio between the number of correctly detected attacks and the total number of attacks, while FAR is computed as the ratio between the number of normal connections that is incorrectly misclassified as attacks and the total number of normal connections. For the purpose of classifier algorithm evaluation, another comparative measure is defined which is Cost Per Example (CPE) [32]. CPE is calculated using the following relation:

$$ {\text{CPE}} = \frac{1}{T}\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{m} {{\rm CM}(i,j) \cdot C(i,j)} } $$
(1)

where CM and C are confusion matrix and cost matrix, respectively. T represents the total number of test instances and m is the number of classes in classification. CM is a square matrix in which each column corresponds to the predicted class, while rows correspond to the actual classes. An entry at row i and column j, CM(i,j), represents the number of misclassified instances that originally belong to class i, although incorrectly identified as a member of class j. The entries of the primary diagonal, CM(i,i), stand for the number of properly detected instances. Cost matrix is similarly defined, that is to say entry C(i,j) represents the cost penalty for misclassifying an instance belonging to class i into class j. Cost matrix values employed for the KDD classifier learning contest are shown in Table 4 [31].

Table 4 Cost matrix values for KDD

The confusion matrix and training time of the proposed RNN model are reported in Table 5. The confusion matrices and training times of MLP and Elman-based neural classifiers, with the same number of nodes in hidden layers, are also reported in Table 5.

Table 5 Confusion matrix and training time of proposed RNN model in comparison with MLP and Elman classifiers

The performance of the proposed model has been compared to some other machine learning methods, in terms of DR, FAR, and CPE as well (Table 6). As shown in Table 6, the proposed RNN model performs better in terms of DR and CPE. FAR of the proposed IDS is not degraded significantly when compared to some recent machine learning methods.

Table 6 Performance comparison of models in intrusion detection

5 Conclusions

In this paper, a partially connected RNN model with four groups of input features has been proposed as misuse-based IDS. Experimental results have shown that the reduced-size neural classifier has improved classification rates, especially for R2L attack category, when compared to other classifiers. The proposed model shows better performance in terms of DR and CPE when compared to some recent related works. The FAR metric has been improved in comparison with some recent machine learning methods, as well.