1 Introduction

Availability is a critical issue in modern distributed systems. Distributed Denial of Service (DDoS) attacks are coordinated attacks against the availability of services in networks, being launched via several compromised computing systems [31]. DDoS attacks are launched to deplete connectivity and processing resources of the victim, causing partial or total unavailability of services to genuine users [41]. The attackers access databases, servers and network applications remotely [25]. One of the earlier launched DDoS attacks was against Yahoo in the year 2000, which caused a total unavailability of services for a significant period of time and severe financial losses [42].

A DDoS attack comprises of the Attacker, Agents, Bots and the Victim. The attacker utilizes many compromised machines (bots) through some agents to launch attacks on the target system. The use of botnets has emerged as a major approach of launching sophisticated DDoS attacks [22, 38]. A botnet comprises of a large number of malware-infected devices which are remotely controlled by a malicious user [16]. Typically, the botmaster sends commands to each bot in his botnet to commence an attack session. Often, the IP addresses of the bots are spoofed, making it extremely challenging for trace-back mechanisms.

DDoS attacks are classified into two (2) essential types; Flooding-based and Vulnerability-based attacks [19]. Flooding-based attacks use huge volumes of vague requests to exhaust vital resources of the victim [5]. They are aimed at bandwidth depletion or memory exhaustion, such that victims are incapable of providing services to authorized users [17]. On the other hand, Vulnerability-based attacks exploit one or more flaws in an application or a bug in the software that implements the target system. They exhaust excessive amount of resources of the victim using a few crafted requests [1].

Flooding-based DDoS attacks can be extremely severe, as to abruptly drain all network resources within a short time [31]. They can be executed in Network/Transport and Application layers using several protocols, such as UDP, TCP, ICMP and HTTP [30]. The most frequent DDoS attacks occur over the User Datagram Protocol (UDP) of network systems [14]. These attacks cause devastating effects such as service interruption, degradation of service, customer dissatisfaction, reputational damages, huge financial losses, security implications, breach of contracts, amongst others. Notably, severe flooding attacks have been launched against many popular organizations, including websites as Twitter, Netflix, The New York Times, CNN, Amazon, Yahoo, BBC, eBay, etc.

Understanding the trends of Distributed Denial of Service (DDoS) attacks and their attack strategies is an important phase in developing effective defenses [37]. The design of an accurate detection system for flooding attacks relies on an in-depth understanding of the behaviors of the attackers in networks. Network analytics comprises of traffic monitoring and traffic classification [12]. However, most existing detection methods cannot accurately distinguish attack flows from benign flows. Consequently, a high false positive remains a lingering challenge of current works. The use of relevant features for detecting malicious flows influences the accuracy of defense systems. Using a single flow feature results in ineffective detection while selecting too many features exhausts more network resources with high computational complexity.

In this study, a behavioral model for characterizing flows in flooding-based DDoS attacks is presented. By a network analysis, three distinct traffic features namely the flow rate, arrival rate and inter-arrival time of packets were identified for characterizing attack flows, which can distinguish flooding DDoS attacks from legitimate flows. These relevant features serve as inputs to any DDoS detection mechanism.

The remainder of this study is organized as follows: In Sect. 2, an overview of related literature is presented. Section 3 details on the behavioral model for flooding-based DDoS attacks. In Sect. 4, the experimental evaluation of the developed model is explained. Section 5 concludes and summarizes the work with suggestions for further research.

2 Background and related work

A major threat to cybersecurity is the Distributed Denial of Service (DDoS) attacks [39]. DDoS attacks are characterized by malicious behaviors which aim to deplete network and/or system resources of the victim. DDoS attacks seek to disrupt applications, web-based services or networks [11]. Flooding DDoS attacks are typically launched by a network of remotely manipulated and well-coordinated bots which are simultaneously and continuously forwarding huge amounts of traffic to the target system [39]. The packets often arrive in high quantities consuming the victim’s critical resources as network bandwidth, I/O bandwidth, memory, disk space, CPU, etc.

A Flash Event (FE) behaves similarly to a Distributed Denial of Service (DDoS) attack. Behal and Kumar [2] likened an FE to a high-rate DDoS (HR-DDoS) attack. In Flash Events, several genuine users concurrently access a particular service, resulting in a reduced performance of the server and unavailability of services [4]. Often, the surge in legitimate traffic results from popular events as the Olympics, new product launch, breaking news and unpredicted events such as natural disasters. However, as an FE originates from an overload by genuine users, it can be resolved through adequate load balancing and provisioning to accommodate more legitimate requests.

Meanwhile, some sophisticated DDoS attackers mimic the patterns of Flash Events to evade detection. As only a few differences exist between the traffics of DDoS and FE, differentiating them is challenging [36]. Several research efforts have been made towards distinguishing FE from DDoS attacks. Some works have employed entropy-based methods to differentiate the traffics of FE and DDoS attacks [2,3,4, 7, 8, 14, 18, 26, 27]. Besides, information theory-based metrics have been proposed in literatures for the detection of DDoS attacks [6, 10, 13, 21, 23, 28, 32]. However, these information-theory approaches suffer low detection accuracy with high computational overheads.

As DDoS attack sources are being programmed and the bots operate according to specified attack functions, detection based on the traffic’s anomaly behaviors is feasible. In literature, several features have been employed for characterizing the flows of DDoS attacks. For instance, a study by Tan et al. [33] used the stream duration and average byte stream rate as primary features to differentiate normal flows from attack flows. In [29], the similarity of flows, page referred and legitimacy were used to differentiate FE from DDoS attacks. Zhou et al. [43] used changes in the number of packets for identifying malicious flows. In a study by [16], the source IP and packet rates were utilized. Also, Nugraha et al. [24] characterized SYN flood attacks by the number of packets.

In Lopez et al. [20], three features such as the total length of backward packets, total length of forward packets and average packet size, were proposed for the identification of compromised network flows. The packet’s arrival patterns were used in [34] to differentiate DDoS attack traffic from flash crowd. Yu et al. [40] utilized the flow correlation coefficient to classify DDoS attacks and FE. Tinubu et al. [35] employed features as the session rate, rate of requests, frequency of requests on a web page and time interval between successive requests to analyze user’s behaviors in HTTP GET flood attacks. In [15], the average duration of flow, average byte of flow and change in speed of flow were utilized for the identification of Flash Events and DDoS attacks in SDN. In [36], the flow features selected to distinguish between DDoS attacks and FE are the new source IPs, number of source IPs and packets inter-arrival time. Similarly, Dayal and Srivastava [9] used features such as the number of flows, flow rate, entropy of protocol, entropy of source IP and entropy of destination IP to identify and categorize possibilities of flooding DDoS attacks in SDN.

From prior researches, it has been observed that a high false positive rate is a consistent occurrence in behavioral detection systems for DDoS attacks. Most of the existing works focus majorly on the number of packets and some other irrelevant features, without considering the time-related behavioral characteristics of packets in the attack flows. This results in misclassifications with high false positives and negatives. Thus, this work is geared towards addressing limitations in research by identifying the relevant features for the classification of flooding DDoS attacks.

3 Behavioral model

Attacker’s behaviors can be established through monitoring different attack traffic launched by various botnet families on networks. Network flows are the basic data structures that can be used to analyze botnet traffic. A flow is a stream of packets passing through the same router with common source and destination IP addresses, source and destination ports and protocol. While the source IPs can be spoofed, the network flows cannot be altered by attackers.

By the analysis of attack traffic from several botnets, the following important behavioral characteristics are established:

  1. (1)

    An aggressive behavior is typical of flooding-based Distributed Denial of Service Attacks (DDoS) traffic. Attack sources continuously flood the victim with useless flows, without awaiting corresponding responses from the target server. A sudden surge occurs in the traffic flow over a relatively short period, as the attacker simultaneously generates traffic through its compromised bots.

  2. (2)

    The distribution of source IP addresses of attackers differs from those of the legitimate users. Legitimate users originate randomly from an Internet community with a dispersive distribution of IP addresses. These IP addresses when aggregated are subject to a Normal distribution. Contrarily, for attackers, the distribution of source IP addresses is concentrated relatively according to the number of bots, with huge number of packets per IP address. These IP addresses when aggregated are subject to a Poisson distribution.

  3. (3)

    Attack flows are similar to one another, as its nodes execute a common program logic to launch an automated attack. These flows possess very close values of standard deviation when aggregated, compared to those of legitimate traffic.

3.1 Feature set selection

From the behavioral characteristics observed of the attack flows, three (3) unique features are identified for the detection of flooding-based DDoS attacks. These features are considered as the most important for detecting the attack flows. The features are the Flow rate, Arrival rate and Inter-arrival time of packets. The flow rate of packets is its sending rate, measured in bits/seconds. The arrival rate is the number of arrivals per unit time, measured in packet/seconds. The packets inter-arrival time represents the difference in time in the arrival of any two successive packets. This time ranges from milliseconds to minutes. The packet’s time interval feature allows for a time prediction of the next anticipated attack.

Figure 1 depicts the behavioral framework of flooding DDoS attacks. The prevalent behavioral characteristics of attack flows are presented, with their corresponding flow features. Also, the proposed characterization Algorithm 1 shows the behaviors of the flow features.

Fig. 1
figure 1

Behavioral framework of flooding-based DDoS attacks

figure a

3.2 Impact of the features on the victim

Equations 15 establish the relationship between the features in the behavioral model and the rate of exhaustion of the victim’s resources.

Considering the number of packets arriving at the victim as a random process.

Based on the similarity of attack flows, the packet arrivals are modeled as a Poisson process with rate \(\lambda\).

Let \({N(t)}\) represent the active network flows at time \(t\),

$$N(t)=\{{ F}_{1}\left(t\right),{ {F}_{2}\left(t\right),{ F}_{3}\left(t\right),{\dots ,F}_{n}\left(t\right)\}}.$$
(1)

Let \(p\left(t\right)\) represent the number of packets of the network flow \({F}_{n}\left(t\right)\),

Let \(A\) represent a sample set of arrival rates of packets,

$$A=(\{{{ \lambda }_{p }\} :p\in {\mathbb{Z}}) }.$$
(2)

\({\lambda }_{p }\) follows the Poisson process with probability density function (pdf):

$$\mathrm{Poisson}\,\left(p\right)=\frac{{\lambda }^{p}{e}^{\lambda }}{p!},$$
(3)

where \(\lambda\) is the arrival rate (packet/s).

For an attack packet with flow rate \({R}_{F }\)(bits/s), the attack arrives at a time \(t\) and progresses at a time \(t+\updelta\), where δ is the inter-arrival time. The network is in a usual state at any time \({t}^{^{\prime}}< t\).

$$\updelta =\frac{1}{\lambda }.$$
(4)

Packet inter-arrival times δ follow exponential distribution and are independent and identically distributed.

Hence, it follows that the Probability of exhaustion of resources \({P}_{E}\) of the victim directly depends on the flow rate \({R}_{F}\) and inversely on the inter-arrival time δ of attack packets.

$${P}_{E }\propto \frac{{R}_{F }}{\updelta }.$$
(5)

4 Implementation and results

The network environment is set up using NS2, a network simulator. Attack flows, Normal flows and Flash Events (FE) are generated in the network using the Scapy tool. The Distributed Denial of Service (DDoS) traffic generated forwards UDP and TCP packets to the victim server. Wireshark is employed for monitoring and capturing the network traffic. The details of the flow features of the three (3) traffics as captured from Wireshark are shown in Table 1.

Table 1 Details of the relevant features in attack and normal scenarios

The behavioral model is validated with two (2) real-world publicly available datasets; the latest CICDDoS2019 and the ‘98 FIFA World Cup dataset. The CICDDoS2019 dataset consists of a mixture of legitimate traffic and the most recent DDoS attacks. The ‘98 FIFA World Cup dataset represents the traffic of Flash Events (FE), and it is the only publicly accessible dataset that represents a Flash Event. The FE traffic was captured from the 66th day of the dataset as it contains the highest number of requests. The effects of the selected flow features (flow rate, arrival rate and inter-arrival time of packets) are compared in Attack, Normal and FE scenarios as obtained from the datasets, and shown in Figs. 2, 3, 4, 5, 6, 7, 8, 9 and 10.

Fig. 2
figure 2

Flow rate of DDoS attacks from CICDDoS2019 dataset

Fig. 3
figure 3

Flow rate of normal traffic from CICDDoS2019 dataset

Fig. 4
figure 4

Flow rate of FE traffic from ‘98 FIFA World Cup dataset

Fig. 5
figure 5

Arrival rate of DDoS attacks from CICDDoS2019 dataset

Fig. 6
figure 6

Arrival rate of normal traffic from CICDDoS2019 dataset

Fig. 7
figure 7

Arrival rate of FE traffic from ‘98 FIFA World Cup dataset

Fig. 8
figure 8

Packet Inter-arrival time of DDoS attacks from CICDDoS2019 dataset

Fig. 9
figure 9

Packet Inter-arrival time of normal traffic from CICDDoS2019 dataset

Fig. 10
figure 10

Packet Inter-arrival time of FE traffic from ‘98 FIFA World Cup dataset

The traffics of DDoS attack, Normal flows and Flash Event as seen from the employed datasets have distinct characteristics and patterns. It can be observed from Figs. 2, 3, 4, 5, 6, 7, 8, 9 and 10 that the selected features clearly show the variance in the behavioral patterns of the three (3) traffics. The features are highly sensitive towards identifying the variations in the traffics. Thus, DDoS attacks can be detected and differentiated from normal network traffic and Flash Events using relevant features as the flow rate, arrival rate and the packets inter-arrival time.

5 Conclusion and future scope

The characterization and mitigation of flooding Distributed Denial of Service (DDoS) attacks go hand-in-hand. An in-depth understanding of the behaviors of attack flows is essential for accurate detections. Notably, a high false positive remains a prominent challenge of existing detection methods. Therefore, in this study through a network analysis, we characterized attack flows using three distinct features namely the flow rate, arrival rate and inter-arrival time of packets. The relationship between the behavioral features and the rate of exhaustion of the victim’s resources was established. The effects of the features were compared in DDoS attack, Normal and Flash Event scenarios, and proved to distinguish attack traffic from legitimate traffic and Flash Events. Thus, the behavioral model lays a good foundation for the mitigation of flooding DDoS attacks.

Further work will make use of the behavioral features for the detection of attack flows using several machine-learning models.