The DDoS attacks detection through machine learning and statistical methods in SDN

Banitalebi Dehkordi, Afsaneh; Soltanaghaei, MohammadReza; Boroujeni, Farsad Zamani

doi:10.1007/s11227-020-03323-w

The DDoS attacks detection through machine learning and statistical methods in SDN

Published: 15 June 2020

Volume 77, pages 2383–2415, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

The Journal of Supercomputing Aims and scope Submit manuscript

The DDoS attacks detection through machine learning and statistical methods in SDN

Download PDF

Afsaneh Banitalebi Dehkordi ORCID: orcid.org/0000-0003-4569-7203¹,
MohammadReza Soltanaghaei¹ &
Farsad Zamani Boroujeni¹

4115 Accesses
121 Citations
1 Altmetric
Explore all metrics

Abstract

The distributed denial-of-service (DDoS) attack is a security challenge for the software-defined network (SDN). The different limitations of the existing DDoS detection methods include the dependency on the network topology, not being able to detect all DDoS attacks, applying outdated and invalid datasets and the need for powerful and costly hardware infrastructure. Applying static thresholds and their dependency on old data in previous periods reduces their flexibility for new attacks and increases the attack detection time. A new method detects DDoS attacks in SDN. This method consists of the three collector, entropy-based and classification sections. The experimental results obtained by applying the UNB-ISCX, CTU-13 and ISOT datasets indicate that this method outperforms its counterparts in terms of accuracy in detecting DDoS attacks in SDN.

SDNTruth: Innovative DDoS Detection Scheme for Software-Defined Networks (SDN)

Article 17 June 2023

Machine Learning-Based DDoS Attack Detection in Software-Defined Networking

An entropy and machine learning based approach for DDoS attacks detection in software defined networks

Article Open access 06 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The SDN is a new architecture consisting of the three data, control and application plane layers, where data and control layers are independent of each other, as shown in Fig. 1. The data plane consists of switches and routers involved in network traffic forwarding; the control plane constitutes the network intelligent section consisting of NOX, POX, Beacon, Floodlight and OpenDaylight controllers, and the application plane contains applications for SDN configuration [1].

The IT organizations may possibly encounter security procedures like DDoS attacks due to the lack of network coherency during re-configuration of the networks to SDN [2]. The DDoS is one of the most adverse attacks in the Internet realm, which weakens the network and the server by influencing the network bandwidth or connectivity in providing regular service [3], as shown in Fig. 2, where as observed the attackers put in too many requests to the open-flow switch from different hosts in a simultaneous manner, thus facing the network with difficulties.

The DDoS attacks target a wide spectrum of different resources and sites, beginning from servers’ banks up to new sites by introducing big challenges for the managers and users of these systems. On Feb 28, 2018, the GitHub site, one of the most important code variety perceptions for programs, was attacked with a high mass traffic of 1.3 Tbps volume, which made it to become off-line for 5 min. This attack introduced many problems to this site [4]. In a time interval within February 5–March 1, 2019, about 17 DDoS attacks were made on University of Albany site, which disturbed the server therein for at least 5 min. Though the data related to the instructors and students were exempt, some of the servers become off-line [5]. These nonstop attacks necessitate devising procedures in detecting and preventing the DDoS attacks.

There exist approaches in this context which next to their advantages have the following drawbacks.

Difficulty in selecting the appropriate time periods for monitoring the traffic in periodic methods [6], the shortcoming and delays in detecting DDoS attacks may lead to losing resources such as bandwidth and CPU [7], deactivation of the controller and switch, unwanted increase in response time [8] and maintaining the network security at high cost of adding hardware therein.

A method including statistical and machine learning methods involved in SDN is proposed in this article to overcome the available drawbacks in DDoS attack detection.

In this method, the mechanism for selection of time periods is applied in monitoring attacks, something not considered in the available methods. Attempt is made here to select the best time period for achieving the maximum detection rate, which is not necessarily of the lowest or highest volume. Periodic monitoring and scheduled traffic screening increase the efficiency of the controller in terms of the workload. Another advantage of this idea is that no custom hardware is necessary to detect attacks. This method increases the accuracy of DDoS detection and provides independence from the network topology.

The assessed attacks here are of the HTTP-based application layer attack type [9], which are observed in their low-volume or high-volume states. The high-volume attacks send many requests to a server or computer and consume extra bandwidth and processors therein [10], while the low-volume attacks have lower entry traffic mass capable of being deceived by expert or impostor attackers [11]. In this method, both these states are assessed. This model consists of collector, entropy-based and classification sections.

The statistical information from switches and host is collected in the controller sections. The entropy volume and the static and dynamic thresholds are calculated through the entropy-based section.

The 15 features for the hosts of the same flow and recorded data samples for incoming packets are extracted through the classification section. The samples are fed into the classification section as the training inputs to devise models through different classification algorithms.

This method yields 99.85% accuracy with 0.1 FPR on UNB-ISCX and 99.12% on CTU-13 dataset. These results indicate this model’s outperformance versus its counterparts. The main contribution of this article is to combine machine learning and statistical methods to improve the detection of DDoS attacks in SDN networks. In the available methods, the advantage of statistical methods and machine learning combination is not addressed in achieving higher detection performance.

This article is organized as follows: The literature is reviewed in Sect. 2; the method is proposed in Sect. 3; the datasets are presented in Sect. 4; the model is evaluated in Sect. 5; the model is implemented in Sect. 6; the results are expressed in Sect. 7; the analysis are run in Sect. 8; the experiments are compared in Sect. 9; and the article is concluded in Sect. 10.

2 Literature review

There exist many studies on DDoS attack detection. The findings of some of the available articles are briefed in this section.

Researchers in [12] applied the K-means clustering and Naive Bayes method for DDoS attack detection, consisting of: (1) clustering the similar data as to their behaviors in groups and labeling all data according to K cluster and (2) classifying the labeled data groups through Naive Bayes algorithm.

The computer vision technique is applied to detect DDoS attacks in [13], where unlike the statistical and machine learning methods, the traffic records are considered as images and detecting the attacks is viewed as a computer version issue. A multivariable coherence analytical method is introduced for accurate traffic record detection and its conversion into images. This method is named the Earth mover’s distance (EMD) computed based on the measured distance between two probable distributions.

As to the known and unknown DDoS attacks, researchers in [14] applied the artificial neural network (ANN) and revealed that the method is subject to algorithm training through the given dataset. Their proposed method is compared with its counterparts such as the backpropagation (BP), Chi-square and support vector machines (SVM) and Snort. They obtained a detection accuracy of 98%.

The DDoS attack detection in cloud computing and SDN networks is assessed in [15], where different models with features are applied to the datasets involved in both the training and test. For them, to increase efficiency updating is a must. Among the three proposed DDoS attack detection models in SDN networks, the best is Mglobal with 89.30% accuracy.

The authors in [16] applied different features to detect whether an attack has occurred or not. Because there exist more than one major parameter in judging DDoS attacks, the significant issue is related to how these parameters are determined; that is, the destination Internet Protocol (IP) address is considered as one of the attack detection parameters which can be detected by entropy. The detection method is evaluated through this model and many parameters.

A fast attack detection method is proposed in [17] to decrease the controllers and switches workload, where the neural network algorithm is applied. A combination of entropy-based and classification algorithms is presented as well. This method can detect both the high-volume and low-volume DDoS attacks.

To implement their own model, researchers in [18] applied the two data mining algorithms of C5.0 and Ripper. Their model is tested on UNB-ISCX datasets and a detection rate of 99% plus is achieved.

Researchers in [19] applied a statistical approach to detect the attacks next to learning machine techniques. In the statistical approach, usually the predetermined distributions are applied to model the traffic network’s normal and abnormal behaviors, in addition to the distance measures techniques, and in the machine learning stage, the K-Means, SVM, decision tree, Naive Bayes algorithm and AI algorithm are applied as a classifier.

A new solution for determining DDoS attack in IOT network infrastructures is proposed in [20], where for managing high traffic flows, the sFlow- and adaptive polling-based sampling techniques are applied in the data-plane layer. After sampling the distributed traffic in data plane, to increase real attack detection, the Snort-IDS and stacked autoencoders (SAE), an unsuperficial algorithm, are applied to obtain the high accuracy and low FPR to distinguish normal traffic from attack.

In a general assessment in [21], the deep learning modules of convolutional neural networks, deep neural networks, recurrent neural networks and deep Boltzmann machines models are of concern. The efficiency of the model of concern is determined by assessing every model in both the binary and multiclass categories by applying the CSE-CIC-IDS2018 and BoT–IoT datasets which contain real traffic. They revealed that implementation of their method is costly and complex because it requires special hardware such as Graphic Process Unit (GPU) and hundreds of software machines.

The researchers in [22] proposed a dynamic multilayer perceptron (MLP) combined with a feature selection technique to detect DDoS attacks, where a feedback mechanism is applied to promote and reconstruct the detector system when detection is not accurate. In their model, as the complexities of traffic network increase and change, some of the selected features will not be able to distinguish the traffic and normal attacks and determine the failure therein. The proposed method in their article in comparison with their counterparts can be of good functionality, while applying feedback mechanism here can enhance FPR and FNR.

3 The proposed method

In this study, a combination of entropy-based method and classification algorithm is applied for detecting high-volume and low-volume DDoS attacks. A two-class classification task for distinguishing normal flows from attacks is of concern here. The three applications introduced in Floodlight controller [23] for collecting flows and calculating entropy are shown in Fig. 3.

The method shown in Fig. 3 consists of the collector, entropy-based and classification sections, which operate together to detect the DDoS attacks that occur in the Floodlight controller. Each section is introduced in the following text.

3.1 Collector section

Both the statistics of the network flows and communications recorded by switches for a specific period of time are collected in this section. These statistics include the total count of the bytes sent, the count of packets sent and the flow time. Upon establishing a connection between two hosts, the first packet is sent to the controller, to be stored next to IP source, source port, destination IP, destination port, packet bytes and packet arrival time [24]. This phenomenon holds true for all packet-in messages. After making all the flows available, the statistics between the two hosts are obtained and given to the controller.

3.2 Entropy-based section

Here, entropy is applied to detect most of the attacks. Providing a fast and convenient manner in filtering suspicious flows is the main advantage of entropy-based filtering. This section is easily developed and implemented in SDN network environments, where low CPU load and easy implementation by the controller suffice.

The DDoS attacks impose additional overhead and disrupt Web activities; thus, the target system is measured by calculating the entropy of each IP in SDN networks. To calculate the entropy, it is assumed that there exists a time window, W, with n distinct elements and $X_{(i, t)}$ is the observation i in the set at time t. The size of W in Eq. (1) is named as the size of time window [25].

$$\begin{aligned} W=\{{{X}_{(1,t)}},{{X}_{(2,t)}},\ldots ,{{X}_{(n,t)}}\} \end{aligned}$$

(1)

where W is the time window, and ${X}_{(i,t)}$ is the count of flows i in time t at n different possible states.

The probability of ${X}_{(i,t)}$ occurring in W is calculated through Eq. (2):

$$\begin{aligned} p({{X}_{(i,t)}})=\frac{{{X}_{(i,t)}}}{n} \end{aligned}$$

(2)

where $p({{X}_{(i,t)}})$ is the occurrence probability of each ${{X}_{(i,t)}}$ in W.

To calculate the entropy ${{H}_{(i,t)}}$, the probability of each element in the set should be multiplied by its logarithm and summed through Eq. (3).

$$\begin{aligned} {{H}_{(i,t)}}=-\sum \limits _{i=1}^{n}{P({{X}_{(i,t)}}})\log P({{X}_{(i,t)}}) \end{aligned}$$

(3)

where $P({{X}_{(i,t)}})$ is the occurrence probability of each IP.

If the calculated entropy < threshold (Thr), as expressed in Eq. (4), then the occurrence of an attack is reported.

$$\begin{aligned} {{H}_{(i,t)}}<Thr \end{aligned}$$

(4)

where Thr is a threshold in this network.

The optimal entropy for each period is determined by testing different time periods. Changing the time periods is very easy in the SDN controller, and this flexibility is one of the advantageous features in SDN networks. Both the time period duration and threshold size are effective in attack detection. The static and dynamic thresholds are introduced in [26], and the detection of high-volume DDoS attacks with DARPA2000 is assessed in [27]. The DARPA2000 datasets are detected by experts based on the DDoS attacking software, indicating that these attacks are simple in structure and type in spite of the complexity of the real data. In this study, these two thresholds are evaluated for both the high- and low-volume attacks by running tests on datasets collected from actual SDN networks and a method is proposed and compared for threshold calculation so as to select the best threshold volume for each type of attack.

3.2.1 Static threshold

This threshold has a static volume, based on the packets specified to the DDoS attacks. Normal traffic and attack traffic are transmitted separately to the network at different time periods. The mean volume of the entropy for different time periods is calculated once for the attack mode and once for the normal mode. Consequently, the static threshold is obtained through Eq. (5).

$$\begin{aligned} Thr={{T}_{1}}=\frac{\mathop {\overline{H}}\nolimits _{attack}+\mathop {\overline{H}}\nolimits _{normal}}{2} \end{aligned}$$

(5)

where $\mathop {\overline{H}}_{attack}$ is the entropy average in normal flows and $\mathop {\overline{H}}_{normal}$ is the entropy average in the attack flow.

3.2.2 Dynamic threshold

A computational method based on time sequence is applied to calculate the dynamic threshold, because it is fast in detecting DDoS attacks in small time windows, as in Eq. (6):

$$\begin{aligned} Thr={{T}_{2}}={{\bar{H}}_{(i,t-1)}}+{{C}_{d}}.{{\sigma }_{{{H}_{(i,}}_{t-1)}}} \end{aligned}$$

(6)

where ${{\bar{H}}_{(i,t-1)}}$ is the calculated mean volumes of the entropies, as in Eq. (7), ${{\sigma }_{{{H}_{(i,}}_{t-1)}}}$ is the standard deviation (SD), at time $t-1$, as in Eq. (8), and $C_d$ is the constant volume of a coefficient determined based on experiments, which does not depend on the time period and the volume of previous entropy.

$$\begin{aligned} {{\overline{H}}_{(i,t-1)}}= & {} \frac{1}{t}\sum \limits _{i=1}^{t-1}{{{H}_{(i,t-1)}}} \end{aligned}$$

(7)

$$\begin{aligned} {{\sigma }_{{{H}_{(i,t-1)}}}}= & {} \frac{1}{t}\sum \limits _{i=1}^{t-1}{({{H}_{(i,t-1)}}}-{{\bar{H}}_{(i,t-1)}}{{)}^{2}} \end{aligned}$$

(8)

where ${{H}_{(i,t-1)}}$ calculates the entropy levels for different time periods and ${{\bar{H}}_{(i,t-1)}}$ is the entropy average. At this stage, the entropy volume and dynamic threshold are calculated for each time period by applying a $\text {C}_\text {d}$ value specifically calculated for the dataset. If the entropy value < the threshold, the attack is detected and a volume is added to the alarm rate parameter that calculates the volume of attack alarms. $\text {C}_\text {d}$ is an experimental parameter, and its volume is influenced by the accuracy of attack detection. Because selecting the best value for $\text {C}_\text {d}$ is subjective, depending on different parameters, to calculate the best $\text {C}_\text {d}$ for each time period, it is better to consider an interaction between the different parameters. One of these parameter has to do with the ability of detecting all attacks, which should not make the count of time periods different, require less computational burden and generate low false alarm rates.

To select the best $\text {C}_\text {d}$, first, in each time period, the TPR with volume of 100 is of concern, next among the selected situations where the FPR is the lowest is of concern, consequently, the obtained $\text {C}_\text {d}$ volume is considered as the best $\text {C}_\text {d}$ at the best time period.

By determining the best time period and best $\text {C}_\text {d}$ volume, that portion of the flow subject to potential attack is detected, selected and forwarded to the classification section to increase the attack detection accuracy. Because this step eliminates a portion of the normal flow that is correctly detected, the count of the normal flow and attack flow is balanced before being delivered to the classification section.

3.3 Classification section

Here, a portion of the dataset at entropy-based section is identified as attack and considered as the entry. As observed in Fig. 4, every flow is considered as one edge forming both the ends of the host’s graph node

For collecting the flows and extracting the features of concern, each IP is first considered as a node, and all the connections between those two and other nodes are applied to obtain the features. In feature collection, the neighbors of a node are of concern, as given in Table 1.

Table 1 Features extracted

The DDoS attacks detection through machine learning and statistical methods in SDN

Abstract

Similar content being viewed by others

SDNTruth: Innovative DDoS Detection Scheme for Software-Defined Networks (SDN)

Machine Learning-Based DDoS Attack Detection in Software-Defined Networking

An entropy and machine learning based approach for DDoS attacks detection in software defined networks

Explore related subjects

1 Introduction

2 Literature review

3 The proposed method

3.1 Collector section

3.2 Entropy-based section

3.2.1 Static threshold

3.2.2 Dynamic threshold

3.3 Classification section

4 The datasets

5 Evaluation

6 Implementation

6.1 The implementation environment and tools

6.2 SDN network configuration

7 Results of the experiments

7.1 High-volume DDoS attack detection results

7.1.1 The results of entropy-based section

7.1.2 The results of classification algorithms

7.2 Low-volume DDoS attack detection results

7.2.1 The results of the entropy-based section

7.2.2 The results of classification algorithm

8 Analysis of computational complexity and time cost of this proposed method

9 Comparative performance experiments

10 Conclusion and future works

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation