Keywords

1 Introduction

An intrusion detection system (IDS) is a software application or hardware appliance that monitors traffic moving on networks and through systems to search for suspicious activity and known threats, sending up alerts when it finds such items. There are two types of IDS system, host-based IDS and network-based IDS. In host-based IDS, a software intelligent agent would monitor the input and output packets from devices. It performs log analysis, file integrity checking, policy monitoring, rootkit detection, real time alerting, and active response. In network-based IDS, sensor will do the monitoring work. The connected network monitors and analyze the network traffics. Similarly, there are two types of IDS techniques, Signature-based IDS and Anomaly-based IDS. In Signature-based IDS, a specific signature pattern is used to analyze the content of each packets in all 7 layers. Whereas, in Anomaly-based IDS, it monitors the network traffic and it compares it against standard baseline for normal use. These classification helps to identify whether it is normal or anomalous network.

2 Background

The Anomaly-Based Detection (ABD) [1] identifies the intrusion detection based on the behavior observation. If there is any change in the normal activity, it will be notify. There are two type of anomaly detection self-learning system and programmed model. Programmed model (ABD), in this model, the system will be trained to detect any abnormal changes. The administrator decide a threshold to flags system if any abnormality was there. Self-learning system (ABD) operated by a set of standard normal operation. This model is structured by observing the network strategies over a set of time. Lu and Traore [2] implemented a genetic programming-based intrusion detection system. They used DARPA dataset. According to them, the FPR is low. Bankovic et al. [3] used KDD99cup dataset. They used principal component analysis-based method to extract data.

3 Proposed System Design

The overall functional diagram of the proposed system is shown in the Fig. 1. The information collected over time regarding the network and the corresponding data are extracted and stored in a relational database after pre-processing. From the database, the required data knowledge are extracted using GNP-based fuzzy rule extraction method [4]. The rules initially defined are updated by computing the support, confidence, and the chi-square attributes. According to this, the datasets are classified.

Fig. 1
figure 1

Overall system design

Using this system, the intruders can be classified accurately using the proposed GNP-based classifier [10]. This classifier [5, 6] used both binary and continuous values for rule extraction. The working principle of the above system is explained below:

The extracted dataset consist of source IP address, destination IP address, and source and destination port number. During pre-processing, the missing elements and redundant data are all eliminated. As shown in Fig. 2.

Fig. 2
figure 2

Data pre-processing

For the convenience of fuzzy rule formation, the continuous attributes of the database are linguistically transformed as α, β, and γ to represent low, mild, and high attributes, respectively. To combine the discrete and continuous values in this paper, GNP-based fuzzy rule mining is used. The fuzzy rules are extracted and updated using the confidence and support values [11]. This above process is shown in Fig. 3a, b.

Fig. 3
figure 3

Fuzzy attribute calculation

Another important parameter used to update are chi-square value (C). If (X, Y) be the support value of a xi and yj. Then, the updated C value for N tuples is calculated as shown in Eq. (1). Where z is the union of (x) and (y). The implementation result shown in Fig. 4.

$$C = \frac{{N(z - x.y)^{2} }}{xy(1 - x)(1 - y)}$$
(1)
Fig. 4
figure 4

Chi-square updation

The fitness (f) of the fuzzy rule [7] is determined by the following equation Eq. (2) and shown in Fig. 5. Where dr, dir are the correctly and incorrectly determined data. T and N are the total number of trained and test data, respectively. The value is scaled between [−1, 1]. If the value is high, then the positive false rate is low and vice versa.

$$f\, = \,\frac{{d_{{\text{r}}} }}{T}\, - \,\frac{{d_{{{\text{ir}}}} }}{N}$$
(2)
Fig. 5
figure 5

GNP-based fuzzy rule implementation

4 Conclusion

In this work, based on fuzzy rule generation a GNP classifier is designed for sub-attribute selection and utilization. This intrusion detection-based classifier [8, 9]  is used to detect anomaly in the network. This proposed system extract many effective rules, which can be used for anomaly detection.