Apache Hadoop Based Distributed Denial of Service Detection Framework

Patil, Nilesh Vishwasrao; Rama Krishna, C.; Kumar, Krishan

doi:10.1007/978-981-15-1384-8_3

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1025))

Included in the following conference series:

International Conference on Information, Communication and Computing Technology

490 Accesses
7 Citations

Abstract

Distributed Denial of Service (DDoS) attack is one of the most powerful and immense threats to internet-based services. It hinders the victim services within a short duration of time by overwhelming with the huge amount of attack traffic. A sophisticated attacker closely follows the current research of DDoS defense, perform a sophisticated attack by compromising millions of unsecured devices, and send a huge amount of attack traffic (Big Data) to destroy a victim. The attack volume size pattern is shifted to Terabits per second (Tbps) from Gigabits per second (Gbps). When a large amount of traffic is processed by the defense system to identify attack traffic, seldom defense system itself can become a victim of DDoS attack. Therefore, there is a demand to implement DDoS defense system which can efficiently process a massive amount of network traffic and immediately distinguish attack traffic. In this paper, we propose a victim-end Hadoop based DDoS defense framework to identify an attack using the MapReduce programming model based on information theory metric. Further, we have implemented Hadoop based DDoS testbed and validated proposed framework using real datasets, such as, MIT Lincoln LLDDoS1.0, CAIDA and live traffic generated using testbed. The experimental result of proposed framework shows higher detection accuracy (average detection accuracy is 97%).

Access provided by Autonomous University of Puebla. Download conference paper PDF

HADEC: Hadoop-based live DDoS detection framework

Article Open access 31 July 2018

DDoS attack prevention using collaborative approach for cloud computing

Article 14 October 2019

An anomaly based distributed detection system for DDoS attacks in Tier-2 ISP networks

Article 19 June 2020

Keywords

1 Introduction

In the present era, every organization has moved their services online for accessible 24/7 to grow business and revenue. When the Internet was designed, the main objectives were fast data transfer, fast processing, and identification of packet tampering [1]. Everyday Internet users and Internet of Things (IoT) devices are exponentially multiplying because of easy access and decentralized nature of the Internet. Denial of Service (DoS) is an attack to submerge the victim service and denied access to genuine users. DoS attack is launched easily using a single device that continuously forwards random traffic to a victim service. However, it can be undoubtedly identified, infiltrated and trace-back immediately to take legal action because of a single source [1,2,3]. A Distributed Denial of Service (DDoS) is an attack which completely deprives the performance of a victim service or seldom it may be unavailable [3]. The DDoS attack can be arisen by compromising multiple devices, launch in a coordinated manner, and sending unnecessary traffic through bots towards a victim service. Therefore, it is a challenging job to detect DDoS attack with greater accuracy in real time.

As per Kaspersky Lab report [4], it has remarked that in a first part (Q1) of 2018, significant growth in number of attack occurrences as well as the span of attack when linked with last part (Q4) of 2017. It presents the incidence of DDoS attacks increasing because of various causes such as exponential increase of the non-secure IoT devices, user-friendly attack tools, and security defects in the network. The DDoS attacks volume size is constantly increasing each year despite effective and powerful detection, mitigation and trace back mechanism have introduced by fellow researchers. Figure 1 shows how the volume size of DDoS attack increases every year.

The first DDoS attack was witnessed in June-July 1999, filed in August 1999 which is a target on a single computer system of the University of Minnesota with the help of 227 compromised systems [5]. It was sustained for almost two days and an attack was launched using DDoS Trinoo [6]. And this onwards in 2018, Github has experienced the highest DDoS attack in the records which is around 1.35 Terabits per seconds (Tbps). However, Github was recovered from this attack within 8 min [7].

Peng et al. [8] categorized the DDoS defense system into four comprehensive categories such as Prevention, Detection, Traceback and Mitigation. Further, DDoS attack can be deployed at victim-end (destination-end), source-end, intermediate (core-network) and distributed [2, 9]. Bhuyan et al. [10] analyzed each deployment location of the defense system and presented victim-end DDoS defense system which is better. The reasons are: (i) It was deployed near to victim, hence closely watched network traffic, (ii) Victim-end deployment is quite simple cost effective (iii) It gets aggregated network traffic for analysis which improve detection accuracy and lessen false positive rate. However, victim-end defense system needs to process a large amount of network traffic flows and sometimes the system can itself become a victim of DDoS attack. Therefore, there is a demand to implement systems which can exploit the benefits of victim-end defense system and efficiently analyze massive amount of network traffic to discriminate DDoS attacks from legitimate traffic on a cluster of nodes. Apache Hadoop [11] is an open source, reliable, scalable and distributed framework. It is one of the most powerful frameworks to store and process a huge amount of data i.e., Big Data on a cluster of nodes. In this paper, we implemented a victim-end Hadoop based DDoS defense framework which detects DDoS attack traffic using MapReduce programming model [17] and validated using real datasets (MIT Lincoln LLDDoS1.0, CAIDA) and live traffic generated using proposed testbed.

The rest of paper is organized as follows, Sect. 2 discuss existing literature in the field of DDoS using Hadoop framework, Sect. 3 proposed Hadoop based detection system, and Sect. 4 present the methodology. Section 5 presents details of our experimental setup, Sect. 6 we present the performance results followed by remarks in Sect. 7.

2 Related Work

In this section, outlined existing literature presented by fellow researchers to combat against a DDoS attack based on a Hadoop framework. The fellow researchers proposed numerous powerful solutions to fight against DDoS attack and to address volume based detection in a Hadoop framework. However, after these attack incidents are increasing linearly. The modern attacker generates low rate DDoS attacks by compromising millions of devices which can undoubtedly circumvent the volume based detection system.

Lee and Lee [12] proposed a Hadoop based DDoS attack system. They implemented a counter based detection algorithm to perform detection using the MapReduce programming model and performed implementation in testbed. However, they validated the defense system using offline batch processing only. According to performance evaluation parameters, the proposed system requires approximate 25 min for 500 GB and 47 min for 1 TB of network traffic. It implies that approximate 5 to 10 min is enough to crash victim service and to refuse access to legitimate users. Khattak et al. [13] proposed a Hadoop based DDoS forensics framework using the MapReduce programming model. They applied “horizontal threshold” and “vertical threshold” inside the distinct time window. They verified a defense system using MIT Lincoln LLS-DDoS-1.0 [23] real datasets and efficiently detected high rate DDoS (HR-DDoS) attacks. However, a defense system is validated only using offline batch processing and low rate DDoS (LR-DDoS) attack easily circumvent the system. Zhao et al. [14] proposed Hadoop and HBase based DDoS detection framework using a neural network. They implemented a testbed setup on cloud platform comprises of a victim web server, attacker nodes, and defense system. However, a defense system demands more time for training and testing phase.

Dayama et al. [15] proposed a Hadoop-based DDoS protection framework. They used a MapReduce programming model to implement a detection algorithm based on threshold value (count number of requests) to discriminate DDoS attack and genuine traffic flows. However, if a sophisticated attacker performs LR-DDoS attack, then it surely circumvents defense system where as in case of a flash event [16], genuine users can be treated as attack traffic.

Hameed et al. [18] proposed Hadoop based framework to combat against DDoS attack. They designed an algorithm for DDoS detection to detect attack four influential attacks such as ICMP, UDP, TCP-SYN, HTTP-GET using MapReduce programming model and extended their own work [19] by proposing a HADEC framework to detect HR-DDoS attack within fair time. They generated attack traffic using Mausezahn tool [20] and added legitimate traffic. A HADEC framework is comprised of traffic capturing server, detection server (Namenode) and data nodes (ranges from 2 to 10). A threshold value (500 & 1000) is used to discriminate between attack traffic & genuine network traffic. However, almost 77% time is demanded by traffic capturing server of total detection time and because of a threshold value (500 & 1000) LR-DDoS attack can easily circumvent the defense system and can be treated as legitimate traffic. Chhabra et al. [21] presented a Hadoop based forensics analytic system for DDoS and implemented using a supervised machine learning algorithm. They have validated framework using CAIDA dataset and claimed 99.34% detection accuracy. However, the proposed system requires more time for training and testing phase. Also, they validated proposed system only using real datasets.

Utmost of the existing literature has widely used volume based detection method to discriminate DDoS attack from legitimate network traffic. Nowadays sophisticated attacker is compromising millions of unsecure devices, originate LR-DDoS attack from each device, and consequently the tremendous amount of useless network traffic target towards the victim server. In this paper we proposed a victim-end Hadoop based DDoS defense system using Information theory metric i.e. Shannon entropy [22].

3 Proposed Hadoop Based DDoS Detection Framework

In this section, we proposed a victim-end Hadoop based DDoS defense system by employing Shannon entropy. The detection framework consists of two phases (i) Network traffic sniffing phase and (ii) Detection process phase. In sniffing phase, live traffic is captured by using Wireshark network traffic sniffer tool and stored in the Hadoop Distributed File System (HDFS). In the detection process phase, the resources are allocated with the help of Yet Another Resource Allocator (YARN) to perform the detection job using the MapReduce programming model. The architecture of the proposed system is depicted in Fig. 2.

Figure 2 consists of three phases, (i) Captures live traffic from legitimate and attacker nodes, (ii) store captured traffic into HDFS and YARN allocates resources for analyzed network traffic flows and, (iii) Using MapReduce programming model traffic to analyze and store result on HDFS and decide whether it is a DDoS attack or normal traffic.

4 Methodology

Information theory plays a significant role in the domain of mathematics, physics, statistics, mechanical engineering, civil engineering, computer science & engineering, and many more areas. Information theory based detection metric is often used in the anomaly detection research from the past several years because it is offering notable divergence between an anomaly and legitimate packet. However, in the case of a DDoS detection based on Hadoop framework, the information theory metric is seldom used to detect attack traffic.

4.1 Shannon Entropy

Shannon entropy can be defined mathematically as,

$$ SE = \sum\nolimits_{i = 1}^{m} {\frac{Pi}{S}{ \log }\frac{Pi }{S}} $$

(1)

where P_i is total number of request with the i^th source IP in time window. And S can be defined as

$$ S = \sum\nolimits_{i = 1}^{m} {Pi} $$

(2)

For our detection framework used information theory detection metric such as Shannon entropy to discriminate DDoS attack from legitimate traffic flows, for that we defined T is the sampling period in which incoming packets are X_1, X_2, X₃ ….. X_n and time window is set to 1 to analyze network traffic flows. Where n – total number of packets, t – total number of time window, m – total number of packets in each time window and its value may be different for each time window or may be zero (if no incoming packet). Hence value of n = m₁ + m₂ + m₃ + …. + m_t. It is require for Eq. (1).

4.2 Dataset Used

The proposed framework is validated using different live network traffic scenarios as depicted in Table 1. Also real dataset CAIDA is used to validate proposed framework.

Table 1. Live traffic scenario.

Full size table

The real datasets such as MIT Lincoln and live traffic (i.e. no attack scenario) are used to form baseline behavior of our proposed framework to get average value (µ), and standard deviation (σ).

4.3 Detection Algorithm

In the Hadoop framework, data processing job consists of a couple of parts, such as Mapper and Reducer job. Each network traffic block (default size is 128 MB) is processed by one mapper that implies if our network traffic data file splits into 10 blocks then concurrently 10 mappers are executed on a cluster of nodes (datanodes) to execute this job. A reducer job is performed by one datanode which is decided by YARN manager.

5 Experimental Setup

In this section, we explain the details of our experimental proposed testbed. In Fig. 3, Hadoop based testbed consists of one sniffing node (victim), one namenode (master), three datanodes (slaves), and multiple traffic generators (legitimate and attacker) nodes. Figure 3 shows the experimental testbed of proposed Hadoop based detection framework.

In Fig. 3, multiple attackers and legitimate systems generates live network traffic flows and send towards capturing server (victim) node (live traffic captured scenario depicted in Table 1, Sect. 4.2). The job of namenode is to only monitoring and metadata management of data blocks stored in different datanodes. The role of datanodes (DN1, DN2, and DN3) is to process the mapper and reducer job to discriminate legitimate and DDoS attack traffic. SecondaryNamenode is used to provide backup in case of failure of Namenode.

6 Results and Discussion

To validate a proposed framework, live traffic is generated using a testbed, the details of generated traffic are discussed in Table 1, Sect. 4.2. We have calibrated the threshold value using the following Eq. 3, and get average value (µ), and standard deviation (σ) values from baseline behavior (i.e. no attack scenario).

$$ th \, = \, \mu \, \pm \, k*\sigma $$

(3)

where µ - entropy mean of each time window, k-tolerance factor and σ-standard deviation of entropies value. To measure the performance of proposed framework, we have used detection accuracy, false positive rate and false negative rate which defined in Eqs. (4), (5) and (6). Table 2 shows results of each performance parameter. Four important parameters of confusion matrix are True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (TN), which are required to calculate performance metrics such as Detection Accuracy, FPR and FNR. Detection accuracy can be calculated using a fraction of attack events detected correctly. False Positive Rate (FPR) is the percentage of normal traffic reported as attack traffic. False Negative Rate (FNR) is the percentage of attack traffic stated as legitimate traffic. The value of tolerance factor k is chosen in such a way where False Positive Rate (FPR) and False Negative Rate (FNR) is crossing each other (in our use case value of k is 1.0). This provides tradeoff between the detection accuracy and false positive rate.

Table 2. Result indicating tradeoff between detection accuracy and FPR.

Full size table

$$ DetectionAccuracy = \frac{TP}{TP + FN} $$

(4)

$$ FPR = \frac{FP}{TN + FP} $$

(5)

$$ FNR = \frac{TN}{TN + FP} $$

(6)

Threshold calibrated (tolerance factor) is done as shown in Fig. 4. Tolerance factor value is calculated in such a way where False Positive Rate (FPR) value and False Negative Rate (FNR) value is crossing each other (in our case is k = 1.0).

Figure 5 shows Receiver operating characteristic (ROC) curve between the detection accuracy and false positive rate. Entropy values are calculated for attack traffic, and legitimate traffic flows as shown in Fig. 6. It shows that low rate DDoS attack entropies values are higher compared to legitimate traffic due to large number hosts are sending traffic to the victim server which helps us to discriminate attack traffic from legitimate traffic flows.

7 Conclusions

The DDoS attack is a big threat to Internet-based services. In this paper, we have proposed a victim-end Hadoop based DDoS detection framework. The proposed defense framework uses the concept of computing entropy of source IP address to discriminate between legitimate and attack network traffic by employing a cluster of nodes. It is observed that the proposed defense framework recognizes the application layer DDoS attack (LR-DDoS) with a high detection rate (97%). The proposed system efficiently handles a large amount of network traffic with quick response.

References

Bhuyan, M.H., Kashyap, H.J., Bhattacharyya, D.K., Kalita, J.K.: Detecting distributed denial of service attacks: methods, tools and future directions. Comput. J. 57(4), 537–556 (2013)
Article Google Scholar
Kumar, K., Joshi, R.C., Singh, K.: A distributed approach using entropy to detect DDoS attacks in ISP domain. In: International Conference on Signal Processing, Communications and Networking, ICSCN 2007, pp. 331–337. IEEE, February 2007
Google Scholar
Sachdeva, M., Kumar, K.: A traffic cluster entropy based approach to distinguish DDoS attacks from flash event using DETER testbed. ISRN Commun. Netw. (2014)
Google Scholar
Kaspersky DDoS Kaspersky DDoS attack report. https://securelist.com/ddos-report-in-q2-2018/86537/. Accessed 15 Jan 2019
Gary, C.: Kessler, November 2002. https://www.garykessler.net/library/ddos.html. Accessed: 15 Jan 2019
Criscuolo, P.J.: Distributed denial of service: Trin00, tribe flood network, tribe flood network 2000, and stacheldraht CIAC-2319. Technical report, California University Livermore Radiation Lab (2000)
Google Scholar
Github survived Biggest DDoS attack ever recorded wired.com 03/01/2018. https://www.wired.com/story/github-ddos-memcached/. Accessed 24 Jan 2019
Peng, T., Leckie, C., Ramamohanarao, K.: Survey of network-based defense mechanisms countering the DoS and DDoS problems. ACM Comput. Surv. (CSUR) 39(1), 3 (2007)
Article Google Scholar
Yu, S., Zhou, W., Jia, W., Guo, S., Xiang, Y., Tang, F.: Discriminating DDoS attacks from flash crowds using flow correlation coefficient. IEEE Trans. Parallel Distrib. Syst. 23(6), 1073–1080 (2012)
Article Google Scholar
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: E-LDAT: a lightweight system for DDoS flooding attack detection and IP traceback using extended entropy metric. Secur. Commun. Netw. 9(16), 3251–3270 (2016)
Article Google Scholar
Apache Hadoop: Open-source software for reliable, and distributed Computing. https://hadoop.apache.org/. Accessed 27 Jan 2019
Lee, Y., Lee, Y.: Detecting DDoS attacks with Hadoop. In: Proceedings of the ACM CoNEXT Student Workshop, p. 7. ACM (2011)
Google Scholar
Khattak, R., Bano, S., Hussain, S., Anwar, Z.: DOFUR: DDos forensics using MapReduce. In: Frontiers of Information Technology (FIT), pp. 117–120. IEEE (2011)
Google Scholar
Zhao, T., Lo, D.C.T., Qian, K.: A neural-network based DDoS detection system using hadoop and HBase. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 1326–1331. IEEE (2015)
Google Scholar
Dayama, R.S., Bhandare, A., Ganji, B., Narayankar, V.: Secured network from distributed DOS through HADOOP. Int. J. Comput. Appl. 118(2), 20–22 (2015)
Google Scholar
Behal, S., Kumar, K.: Detection of DDoS attacks and flash events using novel information theory metrics. Comput. Netw. 116, 96–110 (2017)
Article Google Scholar
MapReduce programming model. https://www.tutorialspoint.com/map_reduce/. Accessed 24 Feb 2019
Hameed, S., Ali, U.: Efficacy of live DDoS detection with Hadoop. In: IEEE/IFIP Network Operations and Management Symposium (NOMS), pp. 488–494. IEEE (2016)
Google Scholar
Hameed, S., Ali, U.: HADEC: Hadoop-based live DDoS detection framework. EURASIP J. Inf. Secur. 2018(1), 11 (2018)
Article Google Scholar
Mausezahn- fast traffic generator. http://man7.org/linux/man-pages/man8/mausezahn.8.html. Accessed 24 Jan 2019
Chhabra, G.S., Singh, V., Singh, M.: Hadoop-based analytic framework for cyber forensics. Int. J. Commun Syst 31(15), e3772 (2018)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Article MathSciNet Google Scholar
MIT LLDOS 1.0 - Scenario One. https://www.ll.mit.edu/r-d/datasets/2000-darpa-intrusion-detection-scenario-specific-datasets. Accessed 24 Jan 2019
CAIDA DDoS dataset. https://www.caida.org/data/passive/ddos-20070804_dataset.xml. Accessed 24 Jan 2019

Download references

Author information

Authors and Affiliations

National Institute of Technical Teachers Training & Research, Chandigarh, India
Nilesh Vishwasrao Patil & C. Rama Krishna
University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Krishan Kumar

Authors

Nilesh Vishwasrao Patil
View author publications
You can also search for this author in PubMed Google Scholar
C. Rama Krishna
View author publications
You can also search for this author in PubMed Google Scholar
Krishan Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nilesh Vishwasrao Patil .

Editor information

Editors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Abdullah Bin Gani
Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India
Pradip Kumar Das
Department of IT, Jagan Institute of Management Studies, New Delhi, India
Latika Kharb
Department of IT, Jagan Institute of Management Studies, New Delhi, India
Deepak Chahal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, N.V., Rama Krishna, C., Kumar, K. (2019). Apache Hadoop Based Distributed Denial of Service Detection Framework. In: Gani, A., Das, P., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2019. Communications in Computer and Information Science, vol 1025. Springer, Singapore. https://doi.org/10.1007/978-981-15-1384-8_3

Download citation

DOI: https://doi.org/10.1007/978-981-15-1384-8_3
Published: 13 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1383-1
Online ISBN: 978-981-15-1384-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics