1 Introduction

Internet of Things is an enormous interconnection of computing devices, digital and mechanical machines, animals, or people, all of which gather data related to their usage and their surroundings and share it without requiring any human intervention [1]. The data picked up by connected devices enables us to make smart decisions, based on real-time information. The abundance and the ubiquity of the internet, the steadily growing capacity of network connection, and the diversity of connected devices make the IoT adaptable and scalable. But at the same time, the main concern in this area is the security of the data [2]. Attacks on IoT devices can breach their security with the advancement of technology [3]. Some of the very common attacks are DoS, Privilege escalation, Firmware hijacking, Brute-force password attacks, Physical tampering, Malicious node injection, and Eavesdropping.

The main focus of this paper is on DoS and DDoS attacks. DoS or denial-of-service is a malicious attempt in which adversary tries to disrupt the legitimate internet traffic of the victim system, server, network, or service by overwhelming the victim or its surrounding by flooding it with internet traffic. DDoS (Distributed Denial of Service attacks) is a type of DoS attack in which the adversary uses multiple computers to flood the victim with the packets to prevent legitimate users from accessing its resources by overloading it. Many researchers have found different ways to safeguard the network of connected devices. Some of the basic steps that are usually taken to prevent the IoT attacks from happening are making the network more resilient, proper network segmentation and testing, timely software and firmware patching, etc. Other solutions that focus on preventing specific attacks like DoS, DDoS, etc. include Machine Learning [4] or Deep Learning algorithms.

To prevent various attacks a robust method is proposed named as SAD-IoT. This method analyses the DoS and DDoS attacks using various Machine Learning and Deep Learning algorithms to classify attacking traffic from normal traffic. In this approach, the stacking of four Machine Learning algorithms (Decision Trees, Naive Bayes, KNN, and Random Forest) has been implemented on the features set built earlier. The same dataset has been used for the Deep Learning model (Multiclass Neural Networks) as well. All the approaches were then compared using various performance metrics. The BoT-IoT dataset was chosen for training purpose which was taken from the Cyber Range Lab of the center of UNSW Canberra Cyber [5]. It contains more than 72 million records. The dataset includes DDoS, DoS, Keylogging, and Data exfiltration attacks, OS and Service Scan, with the DDoS and DoS attacks further categorized based on the protocols. In this work, only the DoS and DDoS attack dataset was used. Next, for training purpose, the generated testbed consisted of six PCs, six sensors, two smartphones, and four Node-MCUs. Out of six PCs, four were used for attacking purpose. The rest of the PCs along with other devices contributed to the generation of benign traffic.

The rest of the paper is organized as follows: Sect. 2 explains novel contributions of the paper. The existing works on the detection of DoS and DDoS attacks are given in Sect. 3. Section 4, gives background works required. The detailed implementation of proposed solutions is described in Sect. 5. The experimental results and analysis is explained in Sect. 6. Finally, Sect. 7 concludes the paper.

2 Novel Contributions of this Paper

In DoS and DDoS attacks, a series of complex and diverse attacks are employed to afflict the victim’s network system, or services which it provides to legitimate users. These attacks may come from different devices and different locations. Hence a detection system is required that can detect diverse DoS and DDoS attacks to secure the data. Given these challenges, a survey on various types of DoS and DDoS attacks and a model for their detection is proposed. The major contributions of the paper are as follows:

  • The real testbed was developed using sensors, PCs, smartphones and Node-MCUs.

  • The testbed was used for data generation consisting of attack traffic and normal traffic.

  • The generated dataset was preprocessed and used for testing purpose.

  • For attack traffic and normal traffic classification the proposed work suggests Stacking with three algorithms-Random Forest, KNN, and Decision Trees with Logistic Regression as meta-classifier for Machine Learning, for training-testing purpose.

  • Finally various Machine Learning and Deep Learning algorithms performance were compared and analysed.

3 Related Prior Research

The DoS and DDoS attack detection methods keep on evolving as the problem is very critical for the organizations. Most of the researchers are leveraging Artificial Intelligence for detection purposes. Some of the existing solutions are described below.

Rohan Doshi et al. demonstrated that normal and DoS attack traffic can be easily distinguished via packet- level Machine learning algorithms. Their feature selection was based on the hypothesis that network traffic patterns from consumer IoT devices differ from those of well-studied non-IoT networked devices [6]. Parth Bhatt et al. proposed a method that utilized a Hybrid Detection Module based on four Machine learning algorithms. The flooding methods simulated were HTTP Flood, Slowloris, Slow HTTP post, TLS renegotiation, Co-AP Flood Multicast, Co-AP Flood Unicast, Co-AP IP Spoofing, and ARP cache poisoning [7].

Dragan Peraković et al. research used artificial neural networks for classification of pre-defined classes of traffic. The different numbers of neurons used in the hidden layer were (30, 35, 40, 45, 50, and 55). Four publicly available datasets were used [8]. Bayu Adhi Tama et al. proposed a method which addressed deep neural network for classification of IoT network attacks. Three standard datasets were used. The performance metrics used were accuracy, precision, recall, and false alarm rate [9].

McDermott et al. proposed an approach in which they have developed a BLSTM-RNN detection model. They have used the developed model and have compared it to a LSTM-RNN for detecting four attack vectors. Models were evaluated for accuracy and loss. A labeled dataset was generated as a part of their research. Both models returned high accuracy and low loss metrics for the four attack vectors used by the Mirai Botnet malware [10]. The approaches of the above mentioned existing works is summarized in Table 1.

Table 1 Summary of related works

4 Background

This section includes a detailed explanation about the attacks and various Machine Learning and Deep Learning algorithms used in this paper.

4.1 DoS and DDoS Attack

DoS Attack DoS attack deprives legitimate users of accessing a Machine, services, or network. It can be done either via flooding the server with large invalid data, or by sending millions of requests to slow down the server [11]. DoS attacks in IoT devices may render them unresponsive. They may even damage the IoT devices to such extent that it requires a replacement or re-intallation.

DDoS Attack DDoS attack is the malicious attempt to disrupt the normal traffic of the target system. This happens when the bandwidth or the resources of the target system are flooded from the compromised numerous devices that are distributed globally [12]. According to Akamai researchers, about 21% of all the DDoS attacks happen from IoT devices around the world.

SYN Flood It is also known as the half-open attack. The attacker makes the victim’s IoT devices unresponsive by consuming its resources by successively sending it syn packets. This is done with the help of spoofed IP addresses. The attacker repeatedly sends connection requests and overwhelms all available ports on the targeted victim’s machine, causing the victim machine to respond to legitimate traffic sluggishly or not at all [13].

UDP Flood The attacker floods the victim’s machine with UDP packets. In this type of attack, sometimes the firewall which protects the victim gets exhausted by the UDP flood, resulting in a denial-of-service to legitimate traffic. This attacks exhausts the resources of the IoT devices and hence makes them unresponsive [13].

4.2 Machine Learning Algorithms Used in Our Solutions

This paper utilised six Machine Learning techniques in total. They are KNN, Decision Trees, Random Forests, Naive Bayes and a Stacking Technique which took Logistic Regression(sixth technique) as a meta classifier and KNN, Random Forest and Decision Tree as classifiers. KNN being a non-parametric supervised learning algorithm, relies on the labeled input data. This algorithm assumes that similar things lie nearby. It uses feature similarity to classify the testing data [14]. Decision Tree is a non-parametric supervised learning which can be used for both regression and classification tasks. It is used to break down the dataset into smaller subsets according to the decision taken at each step. This results in a tree-like structure with decision nodes and leaf nodes [15]. Random Forest is the advanced version of the Decision Tree. It is used to reduce the problem of overfitting in Decision Tree [7]. It is an ensemble learning algorithm, utilizing a large number of decision trees, which can be used for performing both classification and regression tasks. Naive Bayes works on the principle of Bayes Theorem and is used for classification tasks. It has the assumption that each feature makes an equal and independent contribution to the outcome and due to this reason it is also called idiot Bayes [16]. Logistic Regression is used as a binary classifier and is used to describe data and to predict the probability of a categorical dependent variable. The main benefit of using this algorithm is that it indicates the relationship of the dependent variable with each of its features. It provides the direction of association along with the relevance of its features. Stacking is an ensemble method that uses meta-algorithms to combine several Machine Learning algorithms into one predictive model to decrease variance(bagging), bias(boosting) or to improve predictions [17]. In this paper, the model used a parallel ensemble method where base learners work independently. The main idea behind stacking is to take average predictions of all the predictors used in the ensemble instead of hard voting.

4.3 Deep Learning Algorithm Used in Our Solution

Multiclass Neural Networks—A famous framework Keras is used to solve the problem of multiclass [18, 19]. The usual choice for multi-class classification is the softmax layer [20]. In this, the softmax function extends the idea of logistic regression. The function takes an input of the vector of K real numbers. Then it normalizes the vector in the range of (0,1) into a probability distribution for each class in a multiclass problem in such a way that the probabilities sum up to 1. This additional constraint helps in faster convergence during training. The number of nodes in the final layer of the neural network is equal to the number of output classes present. Softmax is used just before the output layer with the same number of nodes as those in the final layer. For softmax to work easily, the class labels are applied with one-hot-encoding. This will create a classification model that is effectively a set of weights (multipliers) for each layer of the network. The initial weights are randomized, not starting from a fixed set of numbers for better results. The categorical cross-entropy loss function is used during the compilation of the model to measure the error between any given hypothesis and the original outcome for those inputs present in their training dataset. Adam optimization algorithm is used to adjust the weights to minimize the error in the training set.

5 Proposed Solution: SAD-IoT

The various modules involved in the proposed solution are shown in Fig. 1. The proposed solution is divided into three modules (Dataset Generation (Input Module), Feature Generation (Pre-processing Module), and Testing and Training Phase (Output Module)). The first module explains the need for dataset generation, how the testbed connections were made to collect the data, and about the network analysis done for the same. The second module includes information about data collection from Wireshark application and the generation of features using Argus application in Linux. The final module is about the different algorithms used for testing and training purposes.

Fig. 1
figure 1

Modular diagram

5.1 Dataset Generation (Input Module)

The dataset of UNSW incorporates both normal IoT-related and other network traffic, along with various types of attack traffic commonly used by Botnets [5]. A portion of dataset of DoS and DDoS attack traffic from the UNSW dataset server was used for traning purpose. Whereas for testing purpose, real time dataset was generated using a total of 20 devices in an isolated IoT environment. The real time dataset included both IoT and non-IoT devices. The list of softwares and hardwares components used are shown in Tables 2 and 3 respectively.

Table 2 Softwares utilised
Table 3 Hardwares utilised

5.1.1 Need for Dataset Generation

Given below are some of the reasons for generation of new dataset:

  • There are limited datasets available online, to train and test different Machine Learning models as per our requirements.

  • Existing datasets incorporates limited number of attacks.

  • Referred literature suggests that previous works lag in real time attack data capturing environment.

5.1.2 IoT Network Setup for Creation of Dataset

The schematic diagram of the process of generation of a dataset for DoS and DDoS attacking packets in the IoT network is shown in Fig. 2. The router acts here as a bridge between the isolated testbed and the internet. At the start of the simulation, all the PCs generated normal internet traffic and then after 15 min, the four attacking PCs started DDoS attacks first and later DoS attacks. The attacking period was 30 s. After every 30 s, the attack was halted and normal internet traffic generation was done in order to simulate real attacks. If the attacks are prolonged for a longer time in the network, there’s a high chance that the attacking devices are identified by the firewall or the other detectors.

Fig. 2
figure 2

Schematic diagram of our testbed

A total of 20 devices were used for the testbed. These devices were located in a separate network using two routers, one of which was connected to the internet through a LAN. Rest of the ports of the routers were utilised to connect the PCs. The testbed also consists six PCs, four sensor nodes with multiple sensors and, two smartphones. These sensors were used for development of IoT sensor Network shown in Fig. 3. The data collected from the sensor was sent to the ThingSpeak cloud. Sample of collected data from the sensor is depicted in Fig. 4. From the above network setup few nodes were made the victims of the DoS and DDoS attack in the testbed. These sensor nodes were connected to the internet via their inbuilt WiFi modules to one of the non attacking PC which was used for WiFi hotspot.

Fig. 3
figure 3

IoT network setup

Fig. 4
figure 4

Flex sensor reading snapshot from ThingSpeak server

The testbed implementation with the devices utilised and their connections is shown in Fig. 5. The proposed testbed consisted of four desktops, two laptops (six PCs), two smartphones, two routers, four NodeMCUs(sensor nodes) and six sensors. The laptops were non attacking PCs in the testbed and the desktops acted as attackers. The smartphones generated normal traffic by surfing internet, playing videos on the youtube and downloading apps from playstore. These smartphones were connected to WiFi hotspot hosted by one of the non-attacking PCs. The simulated testbed was isolated from the institute’s network and a separate network was established using two routers and eighteen other devices. Isolation was done for the safety purpose. Normal data was collected by performing transfer of files and internet browsing. For the DDoS attack simulation, four desktops were used to attack the sensor nodes(victims) with DDoS (TCP and UDP flood) traffic simultaneously. LOIC application and Command Prompt’s ping of death were used for generating malicious traffic (TCP and UDP flood). These data traffic packets were captured using Wireshark.

Fig. 5
figure 5

Testbed implementation

5.2 Data Collection and Feature Generation (Pre-processing Module)

For training purpose, UNSW’s DDoS and DoS raw dataset was collected, which was later preprocessed as per our requirements [5]. The dataset was in the pcap format. Similarly, for testing purpose a testbed with 20 devices was setup in real-time environment. The internet traffic was captured using Wireshark and was dumped in pcap format. The pcap file contains all the information associated with the internet packets but the information is initially hidden and only few columns are shown by default in Wireshark, like start time, end time, packet id etc. Since the essential columns like number of ports activated during connection could not be extracted directly, Argus application was used to manually extract 20 new features. These new features were then mapped for every packet. After this, 8 new features were generated for identification of DoS and DDoS attack traffic. For example “Tpkt Saddr” represented total packets sent by each source IP address. Thus if the source IP address belongs to the attacker the number of packets generated will be much higher. Similarly ”rate” of the transfer will also be high. Thus, these features are of importance to the learning models. Data cleansing was then done using python to fill the empty cells and to remove any unwanted string values occurring in the dataset. For example, asterisk was appended at the end of the “dur” column, which was later removed. After properly cleaning the dataset, this dataset was normalised in range \(-\,1\) to 1 for better training using z-normalisation method. The final labelled dataset consisted of four classes i.e. DDoS TCP, DDoS UDP, DoS and Benign internet traffic. It was then fed to the Machine Learning models [21]. The description of all the features is listed in Table 4.

Table 4 Feature description

5.3 Training and Testing Phases (Output Module)

For training the Machine learning models, the dataset was pre-processed and the discrepancies like null values, mis-matched data types, etc were corrected [22]. Before feeding the dataset to the models, the dataset was normalised in order to keep outliers in check. The first model to be employed was Decision Tree classifier. This model is very fast to train and it is easier to interpret. But the accuracy came out to be 98.599%. The problem with Decision Tree is that it tends to overfit. To overcome this problem, Random Forest was employed. It gave an accuracy of 98.756%. Since KNN is preferred in case of large dataset, the next model used for comparison was KNN. It is robust but it took a lot of time to train on the dataset. However the accuracy came out to be 99.466%. The training accuracy was very good but the training time was compromised. Hence, Naive Bayes was employed next as it is simpler due to it’s quicker convergence and it’s ability to find the influencing features. However it performed the worst among all the aforementioned algorithms. It’s accuracy came out to be 74.274%. Finally, the Stacking algorithm was generated with Decision Tree, Random Forest and KNN as it’s predictors and Logistic Regression as it’s meta-classifier. This algorithm lived up to the expectation and performed the best among all the models and gave an accuracy of 99.611% but in cost of taking the highest time to train the model.

For Deep Learning [23], Keras framework was used to solve the problem of multiclass [19]. A neural network model was created and it was ensured that the input layer had the right number of features. Figure 6 shows the diagramatic representation of this model where X1, X2,... upto X23 represents the input features. The rectified linear unit activation function referred to as ReLU was implemented on the input layer. It was compared with other activation functions as well, these are further discussed in detail in this paper. When the inputs are transmitted between neurons, the weights are applied to the inputs to control the signal between the neurons of the hidden layers. These weights are represented as W(1), W(2), and W(3) in Fig. 6. The Softmax function was used on the final layer. It was used to classify the output into 4 classes with the help of probability distribution. The compilation of the model was done using cross-entropy as a loss argument and Adam as an optimizer. The model was fitted on the dataset using 30 epochs and then evaluated by calculating different performance measures. It was observed that if the model networks are too deep and computation is difficult, then ReLU can be preferred. Leaky ReLU can be used as a solution for the problems of vanishing gradients in ReLU but computation will be extensive. It was concluded that the activation function plays a major role in the optimization of the problem by observing all requirements and information of the deep neural network model [24].

Fig. 6
figure 6

Deep neural network model

6 Security and Result Analysis

This paper analyzed detection of DDoS and DoS attacks in IoT in two scenarios. The first scenario utilised Machine Learning algorithms including Random Forest, Decision Trees, KNN and Naive Bayes. Initially these algorithms were used separately. Later the four algorithms- KNN, Random Forest, Decision Trees and Logistic Regression were stacked together and its performance was compared with aforementioned separate algorithms. The second scenario included neural networks for better and more informative analyses of the attack detection. Comparison was made using different activation functions and then the result was analysed. The performance metrices used were Accuracy, Undetected Rate False Alarm Rate, Precision, Recall and F1-Sccore. These are described below briefly.

  • Precision Ratio of correctly predicted positive observations to the total predicted positive observations. High precision relates to the low false alarm rate.

  • Recall Ratio of correctly predicted positive observations to the all observations in actual class. It is also known as senstivity.

  • F1-Score Weighted average of Precision and Recall.

  • False Alarm Rate The probability of false detection during testing or training. Lower value signifies better performance.

  • Undetected Rate The ratio of number of incorrect detections to the total number of input samples.

  • Accuracy The ratio of number of correct predictions to the total number of input samples.

6.1 Analysis

The composition of the dataset is shown in Table 5. The dataset for testing purpose was designed in such a way to keep the classes balanced and to reduce any misclassification that may occur due to imbalanced dataset. The performance was compared on the basis of parameters like Accuracy and Loss function.

Table 5 Dataset composition

Scenario 1: Security Analysis Using Various Machine Learning Approach.

Initially, four separate algorithms were utilised for training and testing purposes. They were Decision Trees, Random Forests, KNN and Naive Bayes. After this a Stacked algorithm comprising of three classifiers: KNN, Random Forest, Decision Trees along with Logistic Regression as a meta-classifier was utilised. The meta-classifier is used to do majority voting on the outputs of the classifiers used in the Stacking algorithm. Table 6 shows the detection accuracy values of the Machine learning algorithms employed. It was observed that all the models gave the accuracy score close to each other except for Naive Bayes hence, it was not considered in the Stacking algorithm for achieving better results. However, Stacking algorithm performed the best due to the fact that it utilised decision making capability of four algorithms at once. The meta-classifier chose the majority output of the classifiers and gave the final output, which made it more efficient than the existing separate classifier’s results.

Table 6 Accuracy of Machine Learning models

Further, performance measures like Precision, Recall, F1-Score, False Alarm Rate, and Undetected Rate values for all the Machine learning models are compared in Table 7. It is shown classwise for better comparison. Lower value of False Alarm Rate signifies the better performance of the algorithm. The same applies for Undetected Rate as well. The lower the value, the better the algorithm. Thus from the analysis carried out on the above mentioned parameters it was observed that KNN performed the best in all the parameters and was very close to the results of Stacking algorithm. KNN algorithm is simple and works really good on the non-linear type of data due to the fact that it is versatile and makes no assumption about the data. This can be seen when comparing Table 7(c) and (e). Whereas from Table 7(d), it was observed that Naive Bayes performed very poorly due to the it’s nature of assuming that all the features are independent and hence was discarded while choosing the algorithms for Stacking purpose. The Stacking algorithm gave satisfactory results as it utilised the voting method for better predictions. It can also be observed from Table 7 that while classifying the Benign Traffic from Attack Traffic all the algorithms had 100% score in Precision, Recall, and F1-Score columns. Thus, the features chosen for the algorithms performed truly well.

Table 7 Classwise analysis of different machine learning models

The importance of each feature on the output class is depicted in Fig. 7. Random Forest algorithm was implemented for the purpose of generating this graph. It was done in order to understand which features were more significant than others while making a decision during splitting of a node while training the model. Graph shows that ”ltime” has the highest importance and ”drate” has the least importance on the output class.

Fig. 7
figure 7

Feature importance graph

The impact of feature Tportcnt_Daddr on the output classes is shown in Fig. 8.

Fig. 8
figure 8

Impact of Tportcnt_Daddr on label

Deep learning models are preferred over Machine Learning models because they can solve a complex query involving a huge amount of data. Since data traffiic increases with increase in duration of time, hence deep learning models will give better classification results in systems with high internet usage. Deep learning automatically tries to learn features which are important for the classification purpose without any human intervention. In Machine learning models, the features are provided manually. Thus, a comparison was made with Deep learning to check whether the features chosen by this model outperforms Machine learning results. The corresponding result analysis is given for the following section.

Scenario 2: Security Analysis Using Deep Neural Networks.

Deep learning methodology involves multiplication of the input variable with a weight, then a bias is added to the product. An activation function is then applied on the result. Activation functions play an important role in deep learning model, without it the neural network would just perform like linear regression model. The network will use forward and back propagation methods without any non linear transformation. Back propagation in neural networks is used to calculate and the error values related to the weights. Table 8 shows the accuracy of the model when employed with different activation functions. It was observed that ReLU function gave the best accuracy results.

Table 8 Accuracy of Deep Learning Algorithm using different activation functions

Detailed information about how the activation functions performed in terms of Precision, Recall, and F1-Score is shown in Table 9.

Table 9 Classwise analysis of different activation functions

The above analysis was performed on the activation functions which were used for the hidden layers of the deep learning model. First choice of the activation function was tanh. It was chosen because it produces zero centred output thereby aiding the back-propagation process. The benefit of using tanh function was that it mapped negative inputs (the values of features were in range − 1 to 1 after normalization) strongly negative and zero near to it. Since, it had the problem of vanishing gradient as well as production of dead neurons, another function called ReLU was applied to the hidden layers. This function is faster in computation and also used in almost all the neural networks due to the fact that both the function and its derivative are monotonic. Hence, it is differentiable and the range of the function lies between 0 to infinity. The issue with ReLu is that it maps all the negative values to zero which becomes problematic in the case if the features contain negative values hence Leaky ReLu was tried, which was an attempt to solve the problem of dying ReLu. The range of Leaky ReLu is minus infinity to infinity which should have benefitted this case but still ReLu performed better because of the following reasons:

  • The parameters for Leaky ReLu does not change during training phase. It is predefined.

  • The Leaky ReLu function is not differentiable at 0, which may cause values to change abruptly during backpropagation.

From Table 9(b), it can also be interpreted that the ReLU’s FAR and UR values were the lowest making it the better performer among the rest of the activation functions. The ReLU in the hidden layer and Softmax in the outer layer makes a the best pair in this case for the multiclass classification. The undetected rate in all the activation function was highest for the DDoS TCP class making it the most tricky attack with variable attacking pattern.

The Loss and Accuracy graphs of different activation functions used in the Deep Learning model are plotted in Fig. 9.

Fig. 9
figure 9

Loss and accuracy graph using different activation functions

The categorical cross-entropy was used as the loss function for this model. It was chosen because of it’s property to quantify the difference between two probability distribution. First probability distribution is the one predicted by the model and second is the true distribution of the classes. In the above model the predicted probability distribution is given by the softmax layer applied on the outer layer. It results in values between 0 and 1 for each of the classes which all sum up to 1. For the calculation of the error values related to the weights, the back-propagation algorithm of the artificial neural network is applied. It is necessary to determine the correct optimization strategy to minimize the error rate. Thus, Adam optimizer was used as it is computationally efficient and has an adaptive learning rate.

6.2 Comparison Between Existing Solution and Proposed Solution

A detailed comparison is shown in Table 10. The criterias chosen for this were based on the drawbacks of other existing works. This paper tried to overcome these drawbacks by using both Machine Learning as well as Deep Learning model to identify which of the model performs better. Both Dos and DDoS attacks were considered for testing using a real time testbed for better analysis.

Table 10 Comparison between existing solution and proposed solution

7 Conclusion and Future Work

In this paper, the Machine Learning and Deep Learning algorithms and features which are significant for the detection of DoS and DDoS attacks in a network, are analyzed. The findings of the research suggested that KNN performed quite close to the Stacking algorithm, which actually turned out to be the best performer in every aspect among all Machine Learning algorithms mentioned in this paper, but they took a lot of time for training the model. Random Forest and Decision Trees gave similar detection accuracy results which were quite good considering the time taken to train the model, along with the parameters which were considered for performance comparison. As for the Deep Learning model, Deep Neural Networks was implemented using different activation functions. ReLu activation function in the inner layers pairing with Softmax in the outler layer turned out to be the best performer among others and it’s detection accuracy was very close to the Stacking algorithm. Hence, it can be concluded that both Machine Learning algorithms and Deep Learning models can be employed for detection purpose but Deep Learning models are preferred with systems having abundant resources and huge data transfer platform for security purposes as they utilise more resources as they learn over time and will be ideal for detecting new threats. Machine Learning models can be ideally implemented in systems with less resouces as their resource utilisation is less and they are fairlycconsistent and they perform better in relatively less data traffic.