1 Introduction

1.1 Cyber Security

The uncommon utilization of network connected devices and vital dependence on information communication technology throughout the world. Many malicious users try to subvert credentials or simply attack host data. Over the last few years, Loukas et al. [1], there have been different examples of both proofs of concept and real-world attacks. Cyber security analysts Toch et al. [2] and experts have structured and created throughout the years various cyber defense systems to shield resources of associations from malicious attackers. These systems address cyber security threats, for example, virus, Trojans, worms, and botnets, among others Loukas et al. [3]. Existing arrangements dependent on Intrusion Detection Systems (IDS) incorporate (master) dynamic ways to deal with envision and expel vulnerabilities in processing frameworks with which to trigger responsive activities for moderation. Any assurance instrument needs to work by coordinating calculations with great and exact identification capacities, permitting fast handling of the information accumulated by the data sources. Without these capacities, IDSs can’t play out their checking and examination works continuously, making it relatively difficult to identify potential cyber assaults when they are beginning to occur. This issue is because of the way that present systems give progressively high transmission rates. All the more uncommonly, the rates have expanded from 100 Mbps a couple of years prior to the present information rate of 10CGbps in wired systems. Vast volumes of data owing through systems make IDSs insufficient to assemble and dissect each system parcel. For instance, Deep Packet Inspection (DPI) instruments like Koscher et al. [4], can work appropriately on wired systems up to 1 Gbps, beginning to dispose of parcels because of overhead from 1.5 Gbps Checkoway et al. [5]. An on-going report Ward et al. [6], directed concentrated examinations to extricate a careful execution correlation by utilizing Snort and the utilization of machine learning procedures on it, assessing such IDS to process organize traffic up to 10 Gbps arrange speed. These tests show that the normal bundles drop when utilizing Snort achieves 9. 5% in 4 Gbps systems while the normal parcels drop with 10 Gbps systems ascends to 20%. However, and because of the expansion of transfer speed, IDS-based solutions making utilization of deep analysis techniques were compelled to advance towards better approaches of detection. They moved from inspecting raw network packets to analysing traffic network flows with imaginative AI-based procedures McGraw et al. [7].

1.2 Classification of Cyber Attacks

A first dimension for classifying an attack is the goal of the attack. This is often related to the wayan adversary monetizes the attack (e.g., by stealing information and selling it to advertisers orcriminals). Overall, the attack goals fall into one of the following categories Lala et al. [8] as shown in Fig. 1.

Fig. 1
figure 1

Classification of cyber attacks

(1) Stealing information, such as data on a device, media files, and user credentials; this actionis usually performed by spyware malware; (2) tracking user information, i.e., monitoring users’sensitive data (e.g., locations, activities, or health-related data); this action is usually achieved usingmobile malware; (3) taking control of a system, as is done by Trojan, botnet, and rootkit Stefano et al. [9].

A second dimension for classifying an attack is the attack vector and it represents the vulnerability exploited by an adversary to gain access to a network or computer system to perform malicious actions. Attack vectors can be identified at three different layers: Hardware, Network and Application.

The deep learning technologies are used for cyber security analysis and intrusion detection is highly relevant. Deep learning techniques are widely used for malware analysis and in finding unforeseen threats because of malicious software [10,11,12,13,14,15].

1.3 Machine/Deep Learning

Artificial neural network is the basis for all the latest deep learning models. Like the nodes in deep belief network and deep Boltzmann machines these models also include formulas and latent variables which are layer-wise arranged in deep generative models.

In most of the deep learning machines the input data is made into an abstract representation. This process knows when to learn a level optimally and performs accordingly. For example, an image recognition application first layer reads the raw data as an abstract the second layer encodes arrangement of edges, then third layer encodes a part of image, the fourth layer recognizes the image. As the data passes through several layers in this process it is mentioned by word ‘deep’ in deep learning.

More precisely, deep learning systems have a substantial path called credit assignment path. CAP is the chain of transformation from data collection for input to output it is the casual connection between input and output. In recurrent neural networks signal propagates in a layer not lesser than one time and so the CAP depth is unlimited. Whereas in a feed forward neural network CAP depth is that of the network in which it takes place. CAP of depth 2 can emulate any function and so it is shown as a universal approximate. Layers beyond that do not add to the function approximate ability of the network. Additional layers are used for several learning features. Deep learning identifies which feature will improve the performance.

Greedy layer-by-layer method is frequently used to construct the architectures of deep learning. Feature engineering is obviated by deep learning methods, in supervised learning tasks. The data is translated into intermediate representation similar to principal components and removes redundancy in representation by deriving layered structures. In unsupervised learning tasks also deep learning algorithm are employed. This is the vital feature because the availability of unlabelled data is higher than the labelled data. The deep structures which are in unsupervised manner are the neural history compressors and deep belief network.

All the machine learning algorithmic implementations won’t be considered as Deep learning. Deep learning algorithms has the ability to learn huge complex data representations, their applications are enormous. For example, singular algorithm which has statistical mechanisms such as Bayesian algorithms, function approximations like decision trees. Deep learning has the ability to learn massive data indiscriminately. It is a neural model which has the concepts of computing nodes. This model has been drawn from the complex interconnected neuronal structures of human neurons for learning process.

Machine learning has two inherited concepts training and inference. The massive volume of data in training model has been divided into several sets namely training and testing sets and also validation set. The function approximation representations of training data are learnt by a machine learning algorithm. The effectiveness of the training process is validated by the validation sets. The test set determines the final accuracy and effectiveness of the previous data. The input data is given to a trained and implemented machine learning model from which the inferred output is got this is the concept of inference.

Deep learning has various concepts like regression, classification, clustering, auto encoding and others in order to perform the learning tasks by multi-layer neural networks. In the application of multiple layers of multiple nodes, every node gets the input from the previous layers, thus the input data provides the output representation. From this it is clear that the multiple interconnected neurons are more complex.

1.4 Types of Machine Learning

  1. A

    Supervised Shallow Machine Learning Algorithms (SML)

    • Classification

In this the result of learning class will be a set of classes, multi-class classification which results in one class from a set of classes and multi-label classification. Every class is compared with other class in a binary way Paul et al. [16]. As shown in Fig. 2.

Fig. 2
figure 2

Classification of machine learning algorithms

  1. i

    Naïve Bayes (NB)

Naive Bayes is a simple but surprisingly powerful algorithm for predictive modelling and this algorithm is a classification algorithm for binary (two-class) and multi-class classification problems. The technique is easiest to understand when described using binary or categorical input values. It is called naive Bayes or idiot Bayes because the calculation of the probabilities for each hypothesis is simplified to make their calculation tractable.

  1. ii.

    Support Vector Machines (SVM)

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification and regression challenges. However, it is mostly used in classification problems. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

  1. iii.

    K-Nearest Neighbor (KNN)

In pattern recognition, the k-nearest neighbor’s algorithmic program (k-NN) could be a non-parametric methodology used for classification and regression. K-NN could be a sort of instance-based learning, or lazy learning, wherever the operate is barely approximated regionally and every one computation is delayed till classification. The k-NN algorithmic program is among the best of all machine learning algorithms. Both for classification and regression, a helpful technique is wont to assign weight to the contributions of the neighbors, in order that the nearer neighbors contribute a lot of to the average than the more distant ones. K-Nearest Neighbors is one among the foremost basic however essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data processing and intrusion detection.

  • Regression

  1. i

    Logistic Regression (LR)

Logistic regression predicts the chance of an outcome which will solely have 2 values (i.e. a dichotomy). The prediction is predicated on the employment of 1 or many predictors (numerical and categorical). This algorithmic rule too works as similar as NB algorithmic rule, however here their performance is highly not dependent on the size of the training data.

  1. ii

    Random Forest (RF)

Random forest algorithmic rule will use each for classification and therefore the regression reasonably issues. Random forest algorithmic rule may be a supervised classification algorithmic rule. Because the name recommend, this algorithmic rule creates the forest with variety of trees. Within the same approach within the random forest classifier, the upper the quantity of trees within the forest offers the high accuracy results.

  • Shallow Neural Network (SNN)

SNN consists of process components that square measure organized into 2 or a lot of communication layers. SNN square measure supported neural networks, with restricted variety of neurons and layers. These square measure principally used for classification activities.

  1. B.

    Unsupervised Shallow Machine Learning algorithms

    • Clustering

Clustering means that grouping of knowledge points, a number of the famed cluster strategies square measure hierarchical cluster and k-means. They need restricted quantifiability. These cluster strategies represent versatile answer which might be used as preliminary part before adopting a supervised algorithmic rule or for anomaly detection functions.

  • Association

The unknown patterns between information square measure known and that they square measure created appropriate for prediction functions. They will manufacture AN output of not essentially valid rules, therefore they need to be given correct inspections by a personality’s knowledgeable.

  • Hidden Markov Models (HMM)

Hidden Markov Model (HMM) may be an applied mathematics Markov model during which the system being sculptured is assumed to be a Markov process with unobserved. The hidden mathematician model will be delineated because the simplest dynamic theorem network. Hidden mathematician models square measure particularly famed for his or her application in reinforcement learning and temporal pattern recognition like speech, handwriting, gesture recognition, part-of-speech tagging, sheet music following,Wang et al. [17] partial discharges and bioinformatics. In cyber security, HMM square measure principally used with labeled datasets.

  • Dimensionality reduction

It is chiefly meted out for cupboard space reduction. It’s completely different kinds of element and discriminant analysis. Auto-encoders create the input file into AN encoded output type Su et al. [18].

  • Density estimation

It is a applied mathematics extraction or approximation of the info distribution; it finds the density of subgroups of knowledge to analysis correlations Marquardt et al. [19].

  1. C.

    Reinforcement Learning

Reinforcement learning is a category of machine learning which is attracted by its psychological action. Its principle is analogous to infant learning new things from its past behavior to complete a new activity. This learning is entirely different from the other algorithms where it has not been given any instructions to complete the task instead it does it on its own and learns from the previous experience. Real time examples of this category are self-driving cars which reaches the right destination from the previous journey experience or a program like chess that takes the next step from the reward and punishments received from the previous moves that leads to a winning of the game. In this case the program or agent which perceives the environment and takes action tries to maximize the rewards to achieve the goal. This penalty of reward is achieved through dynamic programming.

1.5 Types of Deep Learning

DL algorithms are based on Deep Neural Networks (DNN), they are large neural networks organized in many layers which are capable of autonomous representation learning as shown in Fig. 3.

Fig. 3
figure 3

Classification of deep learning algorithms

  1. 1

    Supervised DL algorithms

    • Fully-connected Feed forward Deep Neural Networks (FNN).

FNN will provide general purpose and flexible solutions for classification tasks, whose computational cost are little expensive. They are a variant of DNN, where the existing layer neurons will be connected to neurons in the previous layer.

  • Convolutional Feed forward Deep Neural Networks (CNN).

CNN is very effective in analyzing spatial data this is because the neuron gets its input from neurons in the previous layer. CNN is not suitable for analyzing non spatial data. It has lower computation cost than FNN.

  • Recurrent Deep Neural Networks (RNN).

In RNN the neurons send its output to previous layer neurons, this character makes them harder to train than FNN. They excel as sequence generators.

  1. 2.

    Unsupervised DL algorithms

    • Deep Belief Networks (DBN).

DBN are designed using a composition of Restricted Boltzmann Machines (RBM), which is a class of neural networks without any output layer. DBN is good in feature extraction so that it can be used for pre-training tasks. They too need training phase, but with unlabeled datasets.

  • Stacked Auto encoders (SAE).

SAE is composed of many auto encoders; it is a class of neural networks with the same number of input and output neurons. SAE performs similar to DBN; it works better for small data sets.

2 Machine Learning/Deep Learning Algorithms

  • Support Vector Machine

Xin et al. [20,21,22,23] discussed Support Vector Machine which belongs to the category of supervised learning that is applied for regression and classification. It is one of the most robust algorithms that solve the problems related to classification which plots the data items in n-dimensional space as points where the various features of the given data acts for the given coordinates. It separates the different data groups using the boundaries based on decisions. It supports Support Vector Classification (SVC) and Support Vector Regression (SVR).

It supports both binary and multi-class classifications. A set of instances having different class values between two groups are separated by using decision boundaries. SVC works based on these decision boundaries. The support vector, which is the closest point to the separation hyperplane, it determines the optimal separation hyper plane. The mapping input vectors located on the separation hyperplane side of the feature space fall into one class, and the positions fall into the other class on the other side of the plane during the classification process. Kernel functions are used by SVM in the case of not linearly separable data points in order to map them into higher dimensional spaces so that they become separable in those spaces Sharma et al. [24].

Saxena [25] proposed a Hybrid PSO-SVM approaches for building IDS. Information Gain and BPSO, these two feature reduction techniques were used in the study. The number of attributes reduced to 18. The classification performance was reported as 99.4% on the DoS, 99.3% on Probe or Scan, 98.7% on R2L, and 98.5% on the U2R. In the case of a Denial of Service (DoS) attack and achieves a good detection rate in the case of U2R and R2L attacks are provided with good detection rates using this method.

  • K-Nearest Neighbor (KNN)

The difference or similarity between two instances is measured using KNN classifier which is based on a distance function.

Rao et al. [26] used Indexed Partial Distance Search k-Nearest Neighbor (IKPDS) to experiment with various attack types and different k values (i.e., 3, 5, and 10). 12,597 samples from the NSl-KDD dataset are randomly selected to test the classification results, resulting accuracy is 99.6% and classification time is very least. Experimental results show that IKPDS, and Network Intrusion Detection Systems (NIDS), have better classification results in a short time.

Vishwakarmaet et al. [27], AkNN intrusion detection method based on the ant colony optimization algorithm (ACO), pre-training the KDD Cup 99 dataset using ACO [28], and studies on the performance of kNN-ACO, BP Neural network and support vector machine for comparative analysis with common performance measurement parameters. The accuracy rate for the method is 94.17%, and the overall FAR is 5.82%. Very small dataset is used for this method.

  • Deep Belief Network

Deep Belief Network (DBN) is a probabilistic generative model has multiple layers of stochastic and hidden variables. The Restricted Boltzmann Machine (RBM) and DBN are interrelated in order to train data efficiently through activations of one RBM for further training stages Kwon et al. [29] many hidden layers are enabled while composing and stacking a number of RBMs. Based on an energy function that can describe the high-order interactions between variables a modelling method generated from statistical physics which is the principle of Boltzmann machine (BM). RBM is a topological structure of a BM. BM is a plurality of hidden layers and a symmetric coupled random feedback binary unit neural network composed of a visible layer. The network node has two units one is the visible unit, and the other one is hidden unit which is used to express a random network and a random environment. The correlation between units is expressed using learning model by weighting the units.

Ding et al. [30], apply Deep Belief Nets (DBNs) to detect malware. The PE files are used as samples which are taken from internet. DBNs are made less prone to over fitting than feed forward neural networks initialized with random weights this is done by unsupervised pre-training algorithm. DBNs produce better classification results than any other learning techniques, such as SVM, KNN, and decision tree; this is because the DBNs can learn from additional unlabelled data. The accuracy rate of the method is 96.1%.

Nadeemet et al. [31], combine neural networks with semi supervised learning to achieve better accuracy using a very small number of labelled samples. KDD Cup 99 datasets are used for tracing the non-labelled data through the Ladder Network and DBN is used to classify data of the label, the obtained accuracy is 99.18% very similar to supervised learning.

Gaoet et al. [32], used different DBN structures and compared them and adjusted the number of layers and number of hidden units in the network model, in order to obtain a four-layer DBN model. For testing KDD Cup 99 dataset was used, the accuracy, precision and FAR of the model were 93.49%, 92.33% and 0.76%.

Zhao et al. [33] aim at the problems of a large amount of redundant data, long training time, and ease of falling into a local optimum in intrusion detection. Based on deep belief network (DBN) and probabilistic neural network (PNN) an intrusion detection method is proposed. Using the DBN nonlinear learning the original data are converted to low dimensional data, n order to retain the basic attributes of the original data. Then the number of hidden nodes in each layer is optimized using the particle-swarm optimization algorithm in order to obtain the best learning performance. PNN is used to classify the low-dimensional data. For testing, KDD CUP 99 dataset was used. The accuracy, precision and FAR of the experimental results were 99.14%, 93.25% and 0.615%.

Alrawashdeh et al. [34] implemented a method for fine tuning thee deep network. The method is based on a deep belief network using Logistic Regression soft-max. To improve the overall performance of the network, the multi-class Logistic Regression layer was trained with 10 epochs on the improved pre-trained data. This method resulted a low false negative rate of 2.47% and detection rate of 97.9% on the total 10% KDD Cup 99 test dataset.

Alom et al. [35] proposed DBN that has gone through a series of experiments in order to find its intrusion detection capabilities; this is done after training with 40% NSL-KDD datasets. The trained DBN network can effectively identify unknown attacks assigned to it up to the accuracy rate of 97.5%.

Tan et al. [36], design a DBN-based ad hoc network intrusion detection model and conduct a simulation experiment on the NS2 platform. This experiment shows that DBN can get better accuracy and applied to Ad hoc network intrusion detection technology. The accuracy and FAR were 97.6% and 0.9%.

  • Recurrent Neural Networks

The sequence data is processed using the recursive neural network (RNN). The data flows from the input layer to the hidden layer to the output layer, every layer is connected to each other and there is no connection between nodes in the traditional neural network, it cannot solve much problems. The strong manifestation in RNN is that the network can remember the information of the previous moment and can apply it to the calculation of the current output, this is because, the RNN relates the current output of a sequence to the previous output. Here, the nodes between the hidden layers become connected, and the input of the hidden layer includes both the output of the input layer and the previous hidden layer output. Any length of sequence data RNN can be processed theoretically, but practically in order to reduce complexity it is often assumed that the current state is only related to the previous states.

Yin et al. [37], proposed a cyclic neural network propose intrusion detection (RNN-IDS). To test the performance of the model in binary classification and multi-class classification, NSL-KDD dataset was used. The test accuracy of binary classification is 83.228% and the test accuracy of multi-classification is 81.29%.

Staudemeyer et al. [38] proposed intrusion detection that implements the LSTM recurrent neural network classifier. The LSTM classifier has certain advantages over the detection of DoS attacks than any other static classifiers in the 10% KDD Cup 99 dataset. The accuracy rate was 93.5% and FAR was 1.622%.

  • Convolutional Neural Networks

The recursive neural system (RNN) is utilized to process on consecutive information. In the traditional neural network model, information from the input layer to the hidden layer to the output layer; the layers are completely associated and there is no association between the hubs between each layer.

Convolutional Neural Networks (CNN) is a kind of artificial neural system that has turned into a hotspot in the field of discourse analysis and picture acknowledgment. Its weight-sharing network structure makes it increasingly like a natural neural network, thus reducing the complexity of the network model and reducing the number of loads.

This preferred standpoint is progressively evident when the network input is a multi-dimensional picture, and the picture can be straightforwardly utilized as the contribution of the system to stay away from the mind boggling highlight detachment and information reproduction in the customary acknowledgment calculation. The Convolutional Network is a multi-layered sensor explicitly intended to perceive two-dimensional shapes that are very invariant to translation, scaling, tilting, or other forms of deformation Bu et al. [39].

CNN is the primary genuinely successful learning algorithm for training multi-layer arranges structures. It decreases the quantity of parameters that must be figured out how to enhance the preparation execution of the BP calculation through spatial connections. As dl design, CNN is proposed to limit the information pre-preparing necessities. The most dominant part of CNN is the taking in highlight orders from a lot of unlabeled information. In this manner, CNN are promising for application in the system interruption identification field.

Wang et al. [40] proposed a malware traffic arrangement technique utilizing a convolutional neural system by taking traffic information as pictures.

3 Machine/Deep Learning Platforms

3.1 Machine Learning Platforms

  • H2O

H2O is a completely open gracefully, disseminated in-memory AI stage with direct quantifiability. It was planned by water.ai and is composed inside the Java, Python and R programming dialects. H2O bolsters the first wide utilized applied science and machine learning calculations along with inclination helped machines, summed up direct models, profound learning and extra.

H2O conjointly has AN exchange driving Auto ML reasonableness that precisely goes through all the calculations and their hyper boundaries to gracefully a pioneer leading body of the most straightforward models. The water stage is utilized by more than 18,000 associations universally and is very fashionable in each the R and Python people group. It is accessible on Linux, MacOS and Microsoft Windows working frameworks. H2O can likewise be utilized to break down datasets in the cloud and Apache Hadoop document frameworks.

  • PMLS

Parallel ML System (PMLS) is an appropriated AI structure. It deals with the difficult framework “plumbing work”, permitting you to concentrate on the ML. PMLS runs with efficiency at scale on investigation bunches and cloud reason like Amazon EC2 and Google GCE. PMLS gives basic circulated programming devices to handle the difficulties of ML at scale: Big Data (numerous information tests), and Big Models (enormous boundary and halfway variable spaces). To address these difficulties, PMLS gives 2 key stages: Bosen, a bounded-offbeat key-esteem store for Data-Parallel ML calculations Strads, a scheduler for Model-Parallel ML calculations.

  • Infosys Nia

It is an information based AI stage, designed by Infosys in 2017 to accumulate and blend organizational information from people procedures and legacy frameworks into a self-learning mental article. It is intended to handle intense business errands like forecast revenues and what product should be built, understanding customer conduct and additional. Infosys Nia permits organizations to oversee customer requests essentially, with a order-to-money strategy with chance mindfulness conveyed in timeframe.

  • Accord.NET Framework

It is an AI structure that is joined with sound and picture preparing libraries written in C#. The system is intended for designers to make applications like pattern acknowledgment, pc vision, pc tryout (or machine tuning in) and signal procedure for industrial use. The Accord.NET Framework is isolated into various libraries for clients to choose from. These exemplify logical registering, sign and picture procedure and backing libraries, with alternatives like common learning calculations, continuous face recognition and that just the beginning.

  • IBM Watson

IBM is a major part in the field of AI, with its Watson stage lodging a variety of apparatuses designed for the two designers and business clients. Accessible as a gathering of open arthropod genus, Watson clients can approach a great deal of test code, starter units and might fabricate psychological component web indexes and virtual specialists. Watson conjointly includes a chatbot structure stage pointed toward novices, which needs almost no AI skills. Watson can considerably offer pre-prepared substance for chatbots to make instructing the hatchling a lot of quicker.

  • DiffBlue

Founded by Daniel Kroening at the University of Oxford, DiffBlue is a committed code automation stage. Furthermore, it is a simple anyway accommodating one at that. Its point is to iscover bugs, refactor code, perform test composing and find and fix shortcomings in code, all done by means of automation.

  • Nervana Neon

Nervana and Intel have united to fabricate the up and coming age of astute specialists and applications and Neon is its open source Python-based AI library. Established in 2014, Neon allows designers to fabricate, prepare and send profound learning advances in the cloud. Neon has loads of video instructional exercises and a 'model zoo' which houses pre-prepared calculations and example contents.

  • Apache Spark MLlib

Apache Spark MLlib is an in-memory data processing framework. It options an oversized info of algorithms that specialize in classification, regression, clustering and collaborative filtering. Within the Apache setup there’s conjointly AN open supply framework referred to as Singa that provides a programming tool for deep learning networks across various machines.

The Comparison of the various Machine Learning Platforms is shown in Table 1.

Table 1 Comparison of machine learning platforms

3.2 Deep Learning Platforms

  1. A

    Tensorflow

TensorFlow is a collection of open source programming library meant for dataflow over a range of tasks. It is an emblematic math library, and is likewise employed for AI related applications such as neural systems.

Tensorflow is a computational system for building AI models. TensorFlow provides a wide range of toolboxs that permit you to develop models at your liked level of reflection. On the other hand, you can utilize more elevated level APIs to indicate predefined structures, for example, straight repressors or neural systems.

TensorFlow is able to run over numerous GPUs and CPUs (with possible extensions such as SYCL & CUDA in the application of the computation of graphics processing units). It is present on operating systems such as macOS ,64- bit Linux, Windows, Android and iOS. Its flexible engineering takes into consideration the simple arrangement of calculation over an assortment of platforms (TPU,GPU,CPU), and from work areas to bunches of workers to portable and edge devices.

The calculations of TensorFlow are communicated as dataflow charts with states. The word TensorFlow is coined from the activities of the performance of neural systems over multidimensional information arrays, that are referred as tensors. In May 2017, Google declared a product stack specifically for portable turn of events, TensorFlow Lite. The variant of TensorFlow is Lite which is meant for mobile and inserted AI. This variant provides an API for Android Neural Networks. Shi et al. [41] in their recent research indicated that performance of TensorFlow is the best in worker grade of multi-threaded execution environment at present with more than 8 threads.

  1. B.

    Deeplearning 4 J

Deeplearning4j (DL4J) is another open-source programming delivered under Apache License 2.0. It is upheld commercially by the startup Skymind, which packs DL4J, Tensorflow, Keras and other deep learning libraries in an undertaking conveyance called the Skymind Intelligence Layer. Deeplearning4j is the principal business grade, open-source, circulated profound learning library written for Scala and Java. Incorporated with Apache Spark and Hadoop, DL4J carries AI to business situations for use on appropriated CPUs and GPUs. DL4j incorporates implementation the limited Boltzmann machine, profound conviction net, profound auto encoder, stacked denoising auto encoder and recursive neural tensor system, word2vec, doc2vec, and Glove. These algorithms all include conveyed equal versions that integrate with Apache Hadoop and Spark.

Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Apache Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep auto encoder, stacked denoisingauto encoder and recursive neural tensor network, word2vec, doc2vec, and Glove. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

  1. C.

    Theano

Theano [42] is a Python library that permits you to characterize, enhance, and assess scientific expressions including multi-dimensional clusters proficiently. In Theano, calculations are expressed utilizing a NumPy-esque sentence structure and arranged to run proficiently on either CPU or GPU models. Theano is an open source venture principally created by a Montreal Institute for Learning Algorithms at the University de Montréal. Theano too supports tensor activities, and GPU calculation, those sudden spikes in demand for Python 2 and 3, and supports parallelism through BLAS and SIMD support.

  1. D.

    Torch

Torch [43] is a logical processing structure with wide support for AI based algorithms that place GPUs first. It is anything but difficult to utilize and proficient, because of a simple and quick scripting language, LuaJIT, and a basic C/CUDA usage. The objective of Torch is to have greatest adaptability and speed in building your logical calculations while making the procedure incredibly basic. Light accompanies a huge biological system of network driven packages in AI, PC vision, signal preparing, equal handling, picture, video, sound and systems administration among others, and expands on head of the Lua people group. At the heart of Torch are the well known neural system and improvement libraries which are easy to use, while having most extreme adaptability in actualizing complex neural system geographies. Construct discretionary charts of neural systems, and parallelize them over CPUs and GPUs in an effective way. Light is continually advancing: it is as of now utilized inside Facebook, Google, Twitter, NYU, IDIAP, Purdue and a few different organizations and exploration labs Shi [41].

  1. E.

    Microsoft Cognitive Toolkit (CNTK)

The Microsoft Cognitive Toolkit [44] is another deep learning structure created by Microsoft Research. Microsoft Cognitive Toolkit portrays neural systems as a progression of computational steps through a coordinated diagram. It very well may be incorporated as a library in C#, Python and C++ projects, or be utilized as an independent with its own scripting language named BrainScript. It can likewise run evaluation elements of models from Javacode, and makes use of ONNX which is a open-source neural Network model that permits move amid other profound learning structures like PyTorch ,Caffe2, MXNet). CNTK has been developed as a computationally integral asset for machine learning with execution like different stages that have seen longer turn of events and more far reaching use Shi et al. [45].

  1. F.

    Caffe and Caffe2

CAFFE (Convolutional Architecture for Fast Feature Embedding) [46,47,48] is a profound learning framework, Caffe gives a total toolbox to preparing, testing, finetuning, and sending models, with very much archived models for these undertakings and GPU support for profound learning and fundamentally picture characterization, beginning in 2014. A Caffe layer is the embodiment of a neural system layer and its Model definitions is that arrange records utilizing convention buffer language. Caffe2, as a major aspect of Facebook Source, Research and Facebook Open constructs upon the original Caffe project, actualizing an extra Python API, supports Mac OS X,Windows, Linux, iOS, Android, and other form stages.

  1. G.

    MXNET

Apache MXNet [49]  is an advanced open-source deep learning programming system, used to train, the deep neural network. Apache MXNet is a quick and versatile preparing and inference structure with a simple to-utilize, compact API for AI and it has been designed with a system point of view to minimize the overhead of loading of data and the Input/Output complexity. In addition it provides the interface named Gluon that permits the developers with different levels of skill set to get trained with cloud based deep learning on edge devices as well as on mobile apps. Gluon with a few lines of code permits fabrication of straight relapse, convolution systems and intermittent LSTMs for object detection, discourse acknowledgment, proposal, and personalization. It has been proved as efficient especially in the implementations of single and multi-GPU where the implementations based on CPU seems to be inadequate.

  1. H.

    Keras

Keras [50] is another important API developed for the neural systems. It has been developed in Python and suitable for running over either CNTK or TensorFlow or Theano. It has been developed with the intention of empowering experimentation in a quick way. The required models are demonstrated using small and simple Python code that is easy to troubleshoot and simple to extend. A model is treated as a group of independent and complete task where the modules are configured in such a way that they can not be stopped with certain limitations as could be expected under the circumstances. To be more specific the cost capacities, neural layers, installment plans, enhancers, regularization plans ,initiation functions are the modules that are independent that you with each other and can be joined to build new models.

The Comparison of the various Deep Learning Platforms is shown in Table 2

Table 2 Comparison of deep learning platforms

4 Attack Categories

4.1 Hardware Attacks

Tehranipoor and Koushanfar et al. [51] at the hardware level, attacks are found that include manufacturing backdoors, gaining access to memory, and hardware tampering. The common goal of these attacks is twofold: modifying the hardware to access sensitive information and creating a backdoor (e.g., install an invisible program in the hardware circuit) that can be used to regain access to the compromised machine. Such hardware attacks can be applied to several types of devices, such as network appliances, surveillance systems, and industrial control systems.

4.2 Network Attacks

Schweitzer et al. [45] network attacks can target the network protocol or the network device software, and their goal is either the denial of service or hijacking a network connection to steal sensitive data. Specifically, frequent attacks using vectors at the network layer are Denial of Service (DoS), IP spoofing and man in the middle attacks. Desmedt et al. [15].

  • Network layer

Zolotukhin et al. [52] proposed the use of stacked auto encoders that recognizes the application layer DDoS attacks in the presence of encoded traffic. The proposed framework focuses on the clusters of the usual patterns of traffic observing the anamoly detection for the trivial DoS attacks without decrypting the packets flowing through the network.Moreover, the SAE to identify attacks intended to emulate run of the mill program movement dependent on their development error of vectored discussion traffic bunches in time stretches. Kim et al. [53] developed an IDS using a LSTM recursive neural network. The model is applied over the KDDCup 1999 dataset that is able to remember around 22 attacks that falls under four categories. In correlation with other neural network systems utilizing the same training data, the authors proposed work shows improved rate of detection and precision particularly over DoS attacks. DL is additionally appropriate to ensure the security of real-world applications with respect to the fundamental circumstance of analysis. Wang et al. [54], proposed a deep convolutional framework for presenting the detection of humans from the images captured through the cameras. They have first pre-trained the network over the dataset of Image Net and further tuned it by training on the CUHK03 re-identification dataset.

There are some changes to the completely associated layers of the model in retraining on the second dataset; the creators can altogether increase the coordinating rate over the available plans. Niimi [55], explored the utilization of profound learning for credit card endorsement assurance and exchange approval. The author has validated the system in R and executed in Amazon's EC2 cloud stage. The assessment experiment proved the comparative exactness to different learning techniques with higher accuracy.


Application Attacks. Application attack is a category of phishing and client-side web attacks are common and frequent attacks, as indicated by fundamental privacy showcase players. The example attacks include email administrations and programs which is a serious threat faced by internet. Concerning through email, phishing is a type of scam where the aggressor attempts to assemble delicate data. Example of this sort includes the credit card numbers b e i n g impersonated by a reputed person t h r o u g h email and other channels of communication [56, 57]. Numerous attacks that belong to application-level category uses social designing techniques that involves users to negotiate the frameworks and manipulate them in forwarding the attack over deception [58]. A typical example of customer side web assaults includes Cross-Site Scripting that involves inserting client-side substance code such as JavaScript into the pages. The code that has been injected could be used for alternative purposes later on. For example to avoid getting the chance to control or to drive a customer to execute a couple of exercises on a distant site in light of a legitimate concern for the attacker.

A large number of application level attacks can be categorized as malware [59].

Malware is any vindictive programming that an aggressor out how to run on the objective PC. It is used to assemble delicate data, to access private PC frameworks, or to perform monstrous attack. Malware is characterized by its pernicious plan, acting in opposition to client requirements. Malware can be arranged into a few classes relying upon the plan objective. The most common malware classes are versatile malware, botnets, ransomware and banking malware. Various methods are utilized to introduce malware on an target system. For instance, portable malware is introduced by means of SMS, through unofficial application stores, or by abusing weaknesses of the OS. Once the malware is introduced, it can play out a few noxious activities like illegal access of user data or following client activities.

  • Application Layer

Zhu et al. [60] introduced DeepFlow which is an Android malware location gadget. The developed instrument uses FlowDroid to get the progression of information from delicate sources to sensitive sinks. The SUSI method is additionally acquired to change the information streams that include detailed arrangement of information. They have arranged the applications by means of signal conviction connect with the transformed information streams as info. Ding et al. [61], extracted opcode successions from Windows for malware order by means of DBN. Changing over the opcode highlights to n-gram portrayals, the highlights were down chosen by most extreme data gain and record recurrence. The creators exhibit both the limit of DBNs to perform classification, just as to perform auto encoding for unaided component selection to enhance the exhibition of shallow learning models like K-Nearest, Decision Tree etc.

The location of on-going assaults continuously is vital to empower ideal reaction and mitigation procedures. Uwagbole et al. [62], planned a framework to identify and forestall SQL infusion assaults by means of half and half static and unique investigation using profound learning procedures. Their intermediary based framework consolidates pattern coordinating with numerical element encoding for neural system and calculated relapse classification. The comparison of the types of Deep Learning attacks in different layers is shown in Table 3.

Table 3 Summary of the types of deep learning attacks in different layers

5 Related Research Review

The three important fields where most cyber ML algorithms are discovering application are Intrusion detection, malware analysis, and spam detection. An overview of each of these is explained below.

Pierazzi et al. [64] plans to find illegal exercises inside a computer or a network through IDS. Network IDS are widely delivered in modern enterprise networks. These frameworks were customarily founded on examples of known attacks, but yet current organizations incorporate different methodologies for anomaly detection, threat detection and order dependent on AI. Inside the raise of interruption recognition area, area, two explicit issues are applicable to our examination: the finding of botnets and of Domain Generation Algorithms. A botnet is a system that contains tainted machines controlled by aggressors and abused to do various unlawful exercises. Botnet recognition objective is to identify correspondences between tainted machines inside the botnet organize and the external order and control workers. Notwithstanding many examination proposition and business tools that address this danger, a few botnets still exist. DGA Spontaneously produces area names, and is frequently utilized by a tainted machine to speak with outside server(s) by periodically creating new hostnames. They show a genuine danger for associations on the grounds that, through DGA, which depends on language preparing strategies, it is conceivable to avoid defences dependent on static boycotts of area names. The authors have made use of ML based DGA discovery strategies.

Malware analysis is an incredibly applicable issue since current malware can automatically produce novel variations with indistinguishable malevolent impacts yet showing up from completely unique executable records. These polymorphic and transformative highlights rout the customary guideline based malware recognizable proof methodologies. ML strategies can be utilized to find malware variations and ascribing them to the right malware family.

Spam and phishing detection It has an enormous arrangement of procedures for diminishing the waste of time and potential risk brought about by undesirable messages. Phishing, speak to the route through which an attacker enters an enterprise network. Attackers are using advanced evasion strategies so that the spam and phishing detection is becoming much difficult. The spam detection process can be improved by using ML approaches.

Using supervised and unsupervised methods many approaches are introduced to tackle the above mentioned situations. The framework are made programmed with the presentation of additional limitations in neural systems such frameworks are called Bridged Multi-Layer Perceptron (BMLP) structures. Tensor auto-encoder is utilized for learning highlights from heterogeneous information and stacked to manufacture tensor profound learning model to learn various degrees of information portrayal. Tensor-based information portrayals are utilized to show nonlinear relationship of information. The exhibition of the proposed tensor profound learning model against the Stacking Auto-Encoder is analyzed by considering representative classification data sets like STL-10, CUAVE, SNAE2 and INEX 2007B. (Jan et al. [65]).

Diro et al. [66] cyber security is the serious problem in the computer sector as the demand of computer is increasing time to time. It is also a known fact thousands of zero day attacks emerging in the field of Internet of things [IoT]. For advance mechanism such as machine learning face difficulty in detecting cyber-attacks. On the other hand success of deep learning (DL) deals with the issues faced by cyber security. The application of deep learning becomes practical because of improved CPU and neural network algorithm. The use DL in cyber-attacks is succeeding in small mutation and novel attacks because of strong extracting capabilities. The self-taught and compression capabilities of deep learning architecture is the key mechanism for detecting the attacks in discriminated from benign traffic. This is compared with traditional machine learning and distributed system and evaluated against the centralized detection system.

In vehicular security in Kang et al. [67], intrusion detection using a deep learning approach has also been applied. The intrusion detection accuracy could be improved by deep belief networks (DBN) based on unsupervised pre-training. Another research of this category has been conducted by Wu et al. [68]. Authors proposed IDS;those anomalies can be detected using Auto Encoders on artificial data by IDS. Here artificial data cases are considered which will not reflect the malicious and normal behavior of real time networks. They also adopted a centralized approach which is impractical for distributed applications such as social internet of things in smart city networks.

For malicious code detection by using Auto Encoders for feature extraction and Deep Belief Networks (DBN) as a classifier for detection the deep learning approach are applied [69]. The accuracy and timing efficiency is higher in hybrid mechanism than in sing DBN. This research does not handle the distributed training and sharing of updated parameters. The other paper which has investigated deep learning scheme for malicious code detection is Wang et al. [17]. It has applied demonizing Auto Encoder for deeper features learning to identify malicious java script code from normal code. The resulting accuracy is good in the best case scenario. This approach can be hardly applied to distributedIoT/Fog systems. Using deep learning approach our model enables parallel training and parameters sharing by local fog nodes and detects network attacks in distributed fog-to-things networks.

Al-Qurishi et al. [70] Sybil attacks are increasing in social networks, Sybil accounts are emerging a lot. The operators of these accounts are working with new techniques in order to avoid detection of their Sybil attacks. The existing Sybil detection techniques are not much useful in preventing and controlling attacks. This must be overcome by updating the existing detection techniques with new data and well developed strategies. A prediction system with the implementation of deep learning solution model has three integrated modules, they are, a data harvesting module, a feature extracting mechanism and a deep regression model, this system can solve problems of Sybil attacks. This model evaluates the data fed to it, which is optimized.

Chen et al. [71] smartphone security has been threatened due to the arrival of mobile malware. Attackers are polluting the training data which shows the ineffectiveness of existing machine learning malware detection tools. This problem can be solved by KUAFUDET learns mobile malware using adversarial detection and it is scalable. KUAFUDET has two phases, an offline training phase and online detection phase. The offline training phase gets data from training set. The online phase uses the classifier trained by the offline training phase. These two fields are interlinked together via a self-adaptive learning scheme to address the adversarial environment. An automated camouflage detector is used to remove the false negatives and send them back to offline training phase.

Pachauria et al. [72] medical wireless sensor networks suffer a lot from a wide range of faults and anomalies. To avoid this, many technologies have been developed but they are not up to the level, so that experiments are being made using machine learning algorithms. It experiments on real time medical data and detects sensor faults in a fast and accurate manner.

Rehman et al. [73] The key factor in security of smart phones is the detection of malware detection . However signature based method used now, that not provide accurate information of zero day attacks and many dimensional viruses. For this hybrid framework work is given has the solution of malware detection. Upcoming method uses the both signature and heuristic based detection. The authors have used machine learning algorithm for extracting files and binaries, for this huge amount of research has been done and finally they that found signature and heuristic has improved accuracy.

Hai et al. [74], industrial anti-virus tools are detecting malware existence using signature based techniques. Malware like metamorphic or polymorphic virus uses some techniques such as mutation and dynamically executed contents (DEC) methods in order to escape from detection tools. Packing or calling external code is a DEC method used by malware programs. New techniques are arising to detect suspicious behavior from various mutated samples of virus; one of them is Control Flow graph (CFG). CFG form such as IDA pro do not reflect the behaviors of DEC methods, they are generated by binary analysis tools. This approach is costly to analyses CFG’s from binaries. This disadvantage makes this approach not suitable for real-world applications. An improvised form of CFG which reflects the DEC behaviors is named as lazy-binding CFG. Deep learning technology well plays with image recognition on huge dataset. In collaboration with deep learning technology, the lazy-binding CFG performs image-based representation. This technique is applied for malware detection on real-world applications with higher accuracy.

Pajouh et al. [75] Internet of things devices are being employed in different fields, for different needs. IoT devices have high capabilities and improved applications so that they are a great target for attacks and malware. IoT malware can be detected using Recurrent Neural network (RNN) deep learning. RNN can be used to analyze ARM-based IoT application’s opcodes. With three various long short- term memory (LSTM) configurations the trained model is evaluated and the LSTM approach gives the best result.

Rav et al. [76] demonstrated the gaining familiarity of wearable devices in today’s world in fields such as sports, wellbeing and healthcare. Since these applications require large analysis and classification, deep learning techniques are preferred. Since these techniques require high computing environment they show low efficiency in wearable devices. So the authors have combined the inertial sensor and data together for accurate and real time classification. To overcome the limitation in deep learning the authors proposed on-node computation. To optimize the upcoming method with on node sensor spectral domain pre-processing is required for data before sending it. The accuracy in classification of upcoming deep learning method is against the state of art methods that using both real time and laboratory. They also proved that the method with on-node sensor is consistent for the smart phones and wearable devices.

He et al. [77], the application of computer and intelligence of communication has shown that increase in monitoring and control of smart grids. The dependence in the field of IT increases the vulnerability of attacks. A severe threat to supervisory control and data acquisition (SCADA) is due to attack of integrity of data in false data injection (FDI). In this paper deep learning technique is used to recognize behavior of FDI attacks with measurements of data and capture the FDI attacks. By doing so, our upcoming detection mechanism effectively relaxes the potential attacks and increases the accuracy. The performance of the upcoming strategy through IEEE 118-bus test system has been illustrated. The scalability by using IEEE-300 bus test system has also been measured.

Hasana et al. [78] the Optical Burst Switching (OBS) network is majorly affected by the Denial of service attack which is also known as Burst header packet flooding attack. In this attack, without any prior acknowledgement about Data Burst the malicious BHP is flood into the network, which in turn leads to poor network performance, data loss and DOS, low utilization of bandwidth. The machine predicted analysis works effectively identifying these attacks in OBS network, but this is not suitable for traditional machine learning approaches like Naïve Bayes, K-Nearest Neighbor’s. To overcome this Deep Convolution Neural Network Model is introduced which detects the attack at very early stage.

Liu et al. [79] deep learning has been widely used in network attack detection problems. Deep learning models are used to analyses the payload in turn gives a convolutional neural network based payload classification approach (PL-CNN) and also recurrent neural network based payload classification approach (PL-RNN). These approaches are efficient and practically possible in attack detection.

Dong et al. [80] many traditional machine learning methods are employed in the intrusion detection, but the traditional learning methods are poor in detection performance and accuracy. To get situation assessment of network the intrusion detection got data from monitoring security events. By introducing deep learning models the performance and accuracy rate can be improved.

Loukas et al. [3, 81] One of the growing detections is detection of cyber-attacks in vehicles. Vehicles support limited and light weight processing resources. At the same time, attacks against vehicles are difficult, often making intrusion detection systems less practical than behaviour-based ones, which is opposite to the conventional computing systems. This technique can be improved with computational offloading which is used resource constrained mobile devices. The real time data is taken and which is given as input to cyber and physical processes, which send data as time series to neural network architecture. This is more reliable and the detection is accurate.

Al-Hawawreh et al. [21] internet industrial control systems, they connect the technical part of applications with physical processors, now they are being threatened by cyber-attacks. This can be overcome by using deep learning models; it has deep auto-encoder and deep feed forward neural network architecture, which are derived from datasets like NSL-KDD AND UNSW-NB15. These models learn and validate data collected from TCP/IP packets. It is a powerful technique.

Yin et al. [37] intrusion detection plays a vital role in information security and to identify various attacks in the network. The Recurrent neural networks (RNN-IDS) are well suited for making a classification model with great accuracy and improved performing ability. By using RNN-IDS the accuracy and performance rate of intrusion detection is improved.

Tang et al. [82], software defined networking (SDN) improves the network security with logical centralized controllers and global network overview. In flow-based anomaly detection SDN environment a deep learning approach is applied. The Deep Neural Network model for intrusion detection is trained with NSL-KDD dataset for improved performance.

Feng et al. [83], proposed the use of an inbuilt play gadget to identify Denial of Service and security attacks. Stay away from the identified assault to spreading out in bigger scope. In the examination, profound LSTM and neural system (DNN) discovery model are used to identify DoS assaults. Later the authors used CNN discovery model to identify XSS and SQL assaults. They have demonstrated that these discovery models accomplished extremely high accuracy, recall, precision and F1−score. Furthermore efficiency with respect to time factor among LSTM,CNN and DNN is in undesirable range. They have concluded that their proposed technique can further be applied for the detection of attacks in ad-hoc networks with little bit modification.

Shenfield et al. [84] this methodology gives to distinguish malignant system traffic utilizing artificial neural systems reasonable for use in profound bundle investigation based interruption identification systems. Down to earth results utilizing a scope of run of the mill arrange traffic information (pictures, dynamic connection library documents, and a determination of different records, for example, logs, music records, and word preparing documents) and malevolent shell code records sourced from the online endeavor and weakness repository endeavor dbB.

Jan [65], has demonstrated that the proposed the ANN design can distinguish among kind and noxious system traffic accurately. The proposed ANN architecture gets a medium exactness of 98%, a normal region under the recipient administrator characteristic bend of 0.98, and a normal bogus positive pace of under 2% in rehashed 10- fold cross-approval. This shows the grouping strategy is powerful, exact, and precise. The tale way to deal with hazardous system traffic location proposed in this paper has the efficiency to enhance the intrusion detection which improves both conventional and cyber physical attacks.

Li et al. [85] Hybrid malicious conspire is dependent on Auto Encoder and DBN (Deep Belief Networks). The Auto Encoder is utilized in profound learning technique to decrease the information dimensionality. This could change over exceptionally troublesome high-dimensional information into low dimensional codes with the nonlinear planning, DBN is composed of multilayer RBM and a layer of BP neural system. Each layer of RBM is based on unaided preparing where the yield vector is set. In the wake of giving the input tests into the half breed model, the pragmatic outcomes show that the location accuracy got by the half and half location technique proposed is higher than that of single DBN. This proposed strategy decreases the time intricacy and has better discovery execution.

Niyaz et al. [86] The network IDS helps the admin to distinguish security penetrates in their associations. There exists numerous difficulties while building up a flexible and potential NIDS for unanticipated and eccentric attacks. The authors gave a profound learning based technique for growing such an efficient and flexible NIDS. They have utilized Self-showed Learning which is a profound learning based procedure, for NSL- KDD - a benchmark dataset for arrange interruption.

The summary of the various attacks, algorithms and solutions is given in Table 4.

Table 4 Summary of the various attacks, algorithms and solutions

6 Conclusion

Machine Learning and Deep learning techniques are inspired by the ability of human brain in learning from previous experience instantaneously. These techniques have been widely adopted by various areas of research for providing solutions for their problems. Cyber security is gaining insight nowadays with the increased internet usage and wide variety of network applications. In this paper we have presented a literature review of the different ML/DL methods for cyber security attacks. Recent tools and platforms for DL and ML are focused and the security solutions to the different categories of attacks are discussed and summarized.