Keywords

1 Introduction

Network in a profound impact on people’s lives and ways of working at the same time, it also brings a lot of security risks and threats. A variety of viruses, security vulnerabilities, attacks have caused the loss of users, enterprise, government, even national security. With increasing network security incidents in recent years, people have a strong sense of security and privacy protection. Therefore, the well-designed security system is a very important and urgent problem in the field of network information security especially in next generation networks (5G) [1, 2] with great security challenges.

At present, the network information security protection measures are divided into passive security and active security. Passive security includes data encryption [3, 4], security authentication [5], firewall [6] and other measures and these Active security is a technology represented by Intrusion Detection System (IDS) [7] which detect possible intrusion by collecting network datasets or information, and sending alerts and responding before an intrusion occurs, or before a hazard occurs. With the development of IDS, even that it could replace the traditional network security measures.

In recent years, with the rise of machine learning (ML)-related models [8], it is becoming a trend to apply machine learning methods into intrusion detection system. In the field of machine learning, Naive Bayesian Classification (NBC) [9] is widely used as the most classical learning algorithm with good classification accuracy. However, the NBC is based on the independence of event attributes, which is difficult to achieve in realistic network behaviors especially in future networks (such as 5G) with great complexity [10]. In response to this shortage, many scholars have put forth an improved method which is based on different attribute weights. Paper [11] propose a weighted BNC model based on Rough Set, it could performance well in small data sets and could do some changes in original information. Paper [12] use the value of every attribute to be as weights, but the attributes attribute are more, each weighted coefficients are small, it cannot play its role in real complex networks. Paper [13] propose a weighted NBC based on correlation coefficient, and it could improve the classification ability of the Bayesian, but the current measure formula is not described accurately for all conditions.

In this paper, we propose a novel IDS based on machine learning an advanced Naive Bayesian Classification (NBC-A) which we give every attributes a weight to reflect the relations between attributes and final classification results. We use the ReliefF [14] algorithm that is robust [15] and can deal with incomplete and noisy data to estimate weights and we get a higher True Positive (TP) rate and a lower False Positive (FP) rate [16] in detection performance that means it has better performance than NBC.

In this paper, we introduce some network information security related works and machine learning works in Sect. 2; we proposed the IDS based on NBC-A in Sect. 2.1; detection performance based on dataset KDD’99 and analysis is in Sect. 3; conclusion and outlook are in Sect. 4.

2 Advanced Naive Bayesian Classification (NBC-A) Model in Intrusion Detection System (IDS)

The method of intrusion detection is to design a network behavior classifier to distinguish the normal and abnormal data in the dataset, simulation or realistic network, so as to realize the alarm function of the attacking behavior. At the same time, intrusion detection by IDS is an uncertain behavior, and Naive Bayes theory is suitable for uncertain probabilistic events. Therefore, the introduction of intrusion detection technology based on NBC in IDS research design is completely reasonable.

2.1 Naive Bayesian Classification (NBC)

The Bayesian decision-making theory provides a probabilistic approach to reasoning. It assumes that the variables to be investigated follow certain probability distributions and can reason from these probabilities and observed data to make optimal decisions. Naive Bayesian Classification (NBC) model based on Bayesian decision-making theory [17], is a simplified Bayesian probability model. The classification model is simple in implementation, fast in classification and high in accuracy. It is one of the most widely used classification models in machine learning.

Given a data set of K attributes and assumed that the values of the K attributes are discrete, the purpose of classification is to predict the type of every case in the test set which is a part of dataset (the other part is train set whose task is to make the NBC’s train). We can give a specific example, whose attributes are from \( a_{1} \) to \( a_{k} \). The probability of the example belonging to class \( C_{i} \) is \( P\left( {{\text{C}} = c_{i} |A_{1} = {\text{a}}_{1} , \ldots ,A_{k} = {\text{a}}_{k} } \right) \). Obviously, according to Bayesian decision-making theory:

$$ P\left( {{\text{C}} = c_{i} |A_{1} = {\text{a}}_{1} , \ldots ,A_{k} = {\text{a}}_{k} } \right) = \frac{{P(A_{1} = {\text{a}}_{1} , \ldots ,A_{k} = {\text{a}}_{k} |{\text{C}} = c_{i} )P\left( {{\text{C}} = c_{i} } \right)}}{{P\left( {A_{1} = {\text{a}}_{1} , \ldots ,A_{k} = {\text{a}}_{k} } \right)}} $$
(1)

Here, \( P\left( {{\text{C}} = c_{i} } \right) \) is a prior probability and can be easily calculated from train set. In data set, \( P\left( {A_{1} = {\text{a}}_{1} , \ldots ,A_{k} = {\text{a}}_{k} } \right) \) is same to every class \( c_{i} \) and it assumes that the values of attributes are independent, we can know:

$$ P\left( {A_{1} = {\text{a}}_{1} , \ldots ,A_{k} = {\text{a}}_{k} } \right) = 1 $$
(2)
$$ P\left( {A_{1} = {\text{a}}_{1} , \ldots ,A_{k} = {\text{a}}_{k} |{\text{C}} = c_{i} } \right) = P(A_{1} = {\text{a}}_{1} |{\text{C}} = c_{i} ) \ldots P(A_{k} = {\text{a}}_{k} |{\text{C}} = c_{i} ) $$
(3)

Putting Formula (2) and (3) in to Formula (1), we can get the method used by Naive Bayesian Classification, that is:

$$ V_{NBC} \left( x \right) = \arg max\,P\left( {{\text{C}} = c_{i} } \right)\prod {P\left( {A_{j} = {\text{a}}_{j} |{\text{C}} = c_{i} } \right)} $$
(4)

Here \( V_{NBC} \left( x \right) \) is indicated the target value out by NBC indicated that the output target. In theory, NBC has the minimum misclassification rate, compared with all the other classification algorithms and it is suitable to be used in IDS to find abnormal behaviors in network.

2.2 Attribute Weighted Naive Bayesian Classification

However, the independence assumption is difficult to meet in the real network behaviors, each network behavior has its own attributes, which have complex relationships and can directly affect the results of intrusion detection judgments. Paper [18] established an attribute weighted NBC, is assigned to give different weights to each attribute to make these relationships effect on NBC:

$$ V_{wNBC} \left( x \right) = \arg max\,P\left( {{\text{C}} = c_{i} } \right)\prod {P\left( {A_{j} = {\text{a}}_{j} |{\text{C}} = c_{i} } \right)^{{W_{j} }} } $$
(5)

Here, \( W_{j} \) is the weight of \( A_{j} \). Different \( W_{j} \) has different inferences on NBC, great \( W_{j} \) makes great impacts on IDS. The key of NBC in IDS is how to determine the weights of different attributes.

2.3 The \( \varvec{W}_{\varvec{j}} \) Determined by ReliefF Algorithm

In the next generation network (such as 5G), the network behavior of the relationship will be far more complex than the current network. However, the algorithm for determining \( w_{j} \) are focusing on relations among on attributes instead of class \( C_{i} \), these algorithms be very difficult to play a role because of the high complexity. Therefore, we propose to use ReliefF algorithm, which directly focuses on the relationship between attribute and final classification (class \( C_{i} \)) results rather than the relationship between attribute and attribute. The ReliefF algorithm as follows:

ReliefF algorithm is a multi-class attribute selection algorithm proposed by Kononenko. Its basic idea is to assign a weight value to each attribute in the attribute set, assign a higher weight to the attribute which could has direct and high relation to final classification (class \( C_{i} \)). For that purpose, given a randomly selected network behavior \( X_{i} \) (line 3), ReliefF searches for its two nearest neighbors: one from the same class, called nearest hit \( H \), and the other from the different classes (class \( C_{i} \), \( {\text{o}} \ne {\text{i}} \)), called nearest miss \( M \) (line 4). Function \( diff\left( {A,I_{1} ,I_{2} } \right) \) (line 6) calculates the difference between the values of the attribute \( {\text{A}} \) for two network behaviors \( I_{1} \) and \( I_{2} \). The whole process is repeated for \( m \) (line 2) times, where \( m \) is a user-defined parameter. \( i,j,o \) and \( k \) are count constants.

ReliefF Algorithm for determining \( W_{j} \):

Input: for each behavior \( X_{i} \) in train set attributes \( A(A_{1} = a_{1} , \ldots ,A_{j} = a_{j} , \ldots ,A_{k} = a_{k} ) \) values and the class \( C_{i} \).

Output: the vector \( W(W_{1} = w_{1} , \ldots ,W_{j} = w_{j} , \ldots ,W_{k} = w_{k} ) \) of weights of the qualities of attributes \( {\text{A}} \).

  • Step1 set all \( {\text{W}} \) as an initial value \( W_{j} = 0 \);

  • Step2 for \( i: = 1 \) to \( m \) do begin

  • Step3 randomly select a network behavior \( X_{i} \);

  • Step4 find nearest \( H_{s} (s = 1,2, \ldots ,q) \) in hit \( H \) and nearest \( M_{s} (s = 1,2, \ldots ,q) \) in miss \( M \);

  • Step5 for \( j: = 1 \) to \( k \) do

  • Step6 \( \begin{aligned} \,W_{j} & \text{ := }W_{j} - \sum\nolimits_{s = 1}^{q} {\frac{{diff\left( {A,X_{i} ,H_{s} } \right)}}{mq}} \\ & + \sum\nolimits_{\begin{subarray}{l} Y \ne class\,C_{i} \\ Y = class\,C_{io} \end{subarray} } {\left[ {\frac{P\left( Y \right)}{{1 - P\left( {class C_{i} } \right)}}\sum\nolimits_{s = 1}^{q} {diff\left( {A,X_{i} ,M_{s} } \right)} } \right]} /\left( {mq} \right); \\ \end{aligned} \)

  • Step7 end;

ReliefF Algorithm does not set the value range of \( W_{j} \), it may be negative, in order to avoid this situation, the proposed standardized operation [19] of \( W_{j} \), the formula is as follows:

$$ W_{j}^{{\prime }} = \frac{{W_{j} - min_{W} }}{{max_{W} - min_{W} }} $$
(6)

Here \( W_{j}^{{\prime }} \) is the standard \( W_{j} \), \( min_{W} \) is the minimum value of \( W \) and \( max_{W} \) is the maximum one.

2.4 The Processes of the Novel IDS Based on Model NBC-A

Combining 3.1–3.3, we get the advanced NBC (NBC-A) which be used in the novel IDS proposed in this paper is:

$$ V_{NBC - A} \left( x \right) = \arg max\,P\left( {{\text{C}} = c_{i} } \right)\prod {P\left( {A_{j} = {\text{a}}_{j} | {\text{C}} = c_{i} } \right)^{{W_{j}^{{\prime }} }} } $$
(7)
$$ W_{j}^{{\prime }} = \frac{{W_{j} - min_{W} }}{{max_{W} - min_{W} }} $$
(8)
$$ \begin{aligned} W_{j} \text{ := }W_{j} & - \mathop{\sum}\limits_{s = 1}^{q} {diff\left( {A,X_{i} ,H_{s} } \right)/\left( {mq} \right)} \\ & + \mathop{\sum}\limits_{\begin{subarray}{l} Y \ne class\,C_{i} \\ Y = class\,C_{io} \end{subarray} } {\left[ {\frac{P\left( Y \right)}{{1 - P\left( {class C_{i} } \right)}}\mathop{\sum}\limits_{s = 1}^{q} {diff\left( {A,X_{i} ,M_{s} } \right)} } \right]} /\left( {mq} \right) \\ \end{aligned} $$
(9)

We divide the IDS into two processes: in the train process, the Train Set includes the known network behavior data and the marked classes, and then Preprocesses: discretization and feature selection. Finally, we use the ReliefF algorithm to weight the feature to get NBC-A; in the test process, the Test Set includes unknown network behavior data, and then discretization, and finally the use of NBC-A to get behavior classification results. The processes of the novel IDS as follows (Fig. 1):

Fig. 1.
figure 1

The whole process is divided into 2 parts: Train Process and Test process, the model NBC-A based on NBC and ReliefF algorithm is used to get the network behavior class in Test Process.

3 Detection Performance and Analysis

3.1 Dataset KDD’99 for Detection Performance

We utilize dataset KDD Cup 1999 (KDD’99) [20] to be as the data for detection performance. KDD’99 is the standard dataset of intrusion detection and consists of two parts: 7 weeks of train data set, about 5,000,000 network connections; 2 weeks of test data set, about 2,000,000 network connections. Each network connection record is marked as normal (Normal) or abnormal (Anomaly), abnormal type is divided into 4 categories of 39 kinds of attack types. For time saving and computer performance, we utilize 10% KDD’99 to be the performance data. The distribution of data as follows (Table 1):

Table 1. The distribution of intrusion types in 10% KDD’99

3.2 Performance Analysis

In detection performance, the experimental platform environment is: Operation System: Windows 7 ultimate, CPU 3.00 GHz, RAM 8 GB, Hard Disk 500G; Programming tools: Spyder (Python 2.7), Dataset: 10% KDD’99 (80% Train Set and 20% Test Set), the detection performance results as follows:

  1. (1)

    From Fig. 2, we could find that the accuracy of NBC-A is greatly higher than NBC one, and the error rate of NBC-A is lower than the NBC in intrusion detection performance. Actually, the average of NBC-A accuracy is 98.50% and NBC is 91.73%. The average of NBC-A error rate is 5.79% and the NBC is 11.98%. It means that model NBC-A can performance greatly better than in NBC in data mining by using ReliefF algorithm.

    Fig. 2.
    figure 2

    Accuracy and error rate are the accuracy of NBC-A is higher than NBC and the error rate is lower than NBC, it means NBC-A has a better performance than NBC in detection performance.

  2. (2)

    From Fig. 3, for different types of intrusion attacks, the TP of NBC-A is generally higher than that of NBC and the FP of NBC-A is much lower than that of NBC. This means the novel IDS we propose in this paper, uses ReliefF algorithm to weight the class attributes to obtain a good effect, can effectively detect the intrusion behaviors in the networks, to ensure the safety of the system.

    Fig. 3.
    figure 3

    TP of NBC-A is high than NBC and FP of NBC-A is lower, that means novel IDS based on NBC-A is more secure and useful the NBC one.

4 Conclusion and Outlook

Aiming at the massive and complex network attacks on the current Internet or next generation network 5G (the more massive and complex), it is reasonable to apply the NBC of Naive Bayes decision theory to IDS. We propose a novel IDS based on NBC-A which makes improvements on the NBC. The novel IDS we propose in this paper utilizes ReliefF algorithm to estimate attribute. Compared with other algorithm, Relief is more robust and efficient, it directly reflects the relationship between attributes and final class results. Model NBC-A is more suitable for large-scale and high-complexity networks. The detection performance shows that we get a higher TP rate and a lower FP rate in detection performance that means that NBC-A has better performance than NBC and practical application significance.

In this novel IDS, we have a lot of improvement in the performance of classifier space, in the future work, how to more effectively combine ReliefF algorithm and relationship of attributes between attributes, and other machine learning classification algorithm, which can further enhance the classifier’s ability of intrusion detection for complex network.