1 Introduction

Due to the emergence of communication technologies and incursion of several networking tools and services in day-to-day lives employed an omnipresent IoT model for empowering the automation that holds self-directed functions and dispersed computations in huge-scaled networks. With physical platform, the IoT model includes communication network for accumulating and exchanging beneficial data to fully provide the IoT benefits. Inspite huge number of IoT tools and deployments, the feasibility of IoT is far and owes to the existence of theoretical description. The functionality of network and robustness are directly based on the structure of network and disturbance of some devices in IoT platform can acquire terrible threats to operation, which is a major issue of IoT [1]. IoT can offer several advantages and it also maximizes the danger of exposure to several privacy and security risks. Considering IoT, the risks of security go far away from information risk or DoS. These risks are now based on real lives that include physical security. Other issues are based on privacy [2]. The emergence in count and erudition of unidentified cyber-attacks has shed a shadow in employment of smartest devices. It originates a way that heterogeneity and allocation of IoT tools and services made the IoT security a complex process. Moreover, the detection of attack in IoT is fundamentally different compared to classical techniques due to special service needs of IoT [3].

Cyber security is an imperative factor to be employed throughout the world and all the promising technologies as communications networking, and information processing. In addition, other important techniques as cloud computing, social networks and barcodes utilized in IoT raises cyber security issues. The applications of IoT, such as Smart Home suffered the majority from cybercrimes and hence evaluated by several countries, such as China, U.K and U.S.A. These countries have highlighted extraordinary laws for ensuring IoT security. The risk of cyber security n IoT maximizes the devices count linked to network. These devices make the data as target for creating an attack by hackers [4]. Due to quick emergence of IoT devices, a huge number of cyber-attacks occur, which targets these kinds of devices. It considered that majority of IoT attacks include botnet-based attacks. In addition, several classical rule-based detection models are modelled by the attackers, but the classical Intrusion Detection System (IDS) are not capable to be deployed in IoT platforms because of resource issues in these devices [5]. If IoT platform is violated, then attackers pose the capability to disperse the data of IoT to unauthenticated parties, which can influence the reliability and accuracy of IoT data over the complete life cycle. Hence, such cyber-attacks required to be solved for safer IoT usage. Presently, various measures are taken for managing the security problems in IoT model, which are done in recent days. The majority of cyber security techniques are devised by coupling the domains of machine learning and cyber security [6].

Recently, there is huge attention in revealing the usage of Artificial Intelligence (AI) methods, like Deep Learning (DL), and Machine Learning (ML) on developing solutions of cyber security that includes detection of malware, privacy-preserving methodologies, forensic examination and threat intelligence. The DL-based techniques involve a learning strategy with various layers and each layer poses an imperative count of computational nodes. However, the design of good AI-based IoT attack discovery model remained a major issue [7]. The major machines learning techniques are utilized for predicting the cyber security attacks, like Naïve Bayes, Random Forest (RF) and so on. The DL is a sophisticated ML, which expands artificial neural network techniques [4]. In [8], a cyber-attack detection model is devised for dealing with sinkhole attacks, which targeted the IoT devices. They obtained elevated accuracy of detection on mobile cases with its learning’s. In [9], a battery exhausted prevention model is devised on the basis of mask network and Bluetooth low energy (BLE). In [10], a DDoS attack detection model is devised for handling the big data to detect the attacks. In [11], an entropy-based DDoS attack measurement is devised based on packet size and it helped to find the DDoS attack with Software defined networking (SDN). The majority of techniques revealed to discover the cyber-attacks in IoT, but it mostly focuses detection model for particular threats of IoT [5].

A method is devised for Cyber forensic investigation using deep learning-based feature fusion in big data-based IoT. The simulation of IoT network is done and the accumulated data are routed towards the BS using FGSA. Subsequent to routing, the cybercrime detection is performed at BS where the data is partitioned using an eFCM. The MapReduce is employed wherein the mappers are responsible for performing the feature fusion using mutual information and DQNN, while reducer is used for cyber crime detection using the DBN with proposed FrMSO algorithm.

The main contributions:

  • FrMSO-based DBN for Cyber attack discovery: The FrMSO-based DBN is considered to identify cyber attack. Here, the update of DBN weight is performed with FrMSO to generate best weights for tuning DBN in order to attain effective outcomes.

  • FrMSO: It is developed by integrating the Fractional Calculus (FC), Mayfly Optimization Algorithm (MA), and the Shuffled shepherd optimization Algorithm (SSOA).

The rest of sections includes: Sect. 2 exposes prior cyber attack discovery models. Section 3 presents IoT structure. Section 4 presents devised methodology for detecting the cyber attack. Section 5 presented ability of developed technique in contrast to other methodologies. Section 6 gives conclusion.

2 Literature Review

The eight priorly presented cyber attack discovery methodologies are inspected. Soe et al. [12] developed correlated-set thresholding on gain-ratio (CST-GR) for choosing essential features for determining the cyber-attacks. The detection model was lightweight and utilizes Raspberry Pi modules. The essential features linked with each attack were utilized to make the classifier to process quickly. However, the method was unable to discover other types of attacks on real platform. To identify other attacks, Samy et al. [13] devised a comprehensive attack detection module for detecting various types of cyber-attacks in IoT with DL. The developed model executed on attack detector considering the fog nodes, due to its disseminated nature, elevated capacity of computation and nearness amongst the edge devices. However, it poses a major issue in data labeling accumulated from edge layer and makes the process complicated. To reduce complexity, Gopalakrishnan et al. [14] developed DL-based traffic prediction with a data offloading mechanism with cyber-attack detection (DLTPDO-CD). Here, the bidirectional long short term memory (BiLSTM) was utilized for offloading data. Thereafter, the Adaptive Sampling Cross Entropy (ASCE) was utilized for increasing network throughput to make effective decisions. At last, the DBN trained with Barnacles Mating Optimizer (BMO) algorithm was adapted for detecting the cyber-attacks. However, the relaxation process may destroy the performance. To improve performance, Kumar et al. [15] devised hybrid feature reduced model for detecting the cyber attack in attack. Ranking of feature was done with correlation coefficient and the Random Forest (RF) was utilized for offer various sets of feature, which was integrated with AND operation. However, the method did not able to increase the accuracy.To elevate accuracy, Gurpal Singh Chhabra et al. [16] developed generalized forensic model with Google’s programming model and MapReduce for detecting the cyber attack. Here, the tools, like Hive, Hadoop, R and Mahout were utilized for attaining parallel processing. Moreover, this method is scalable, but it suffered from elevated time of processing. To reduce processing time, Huma et al. [17] devised hybrid deep random neural network (HDRaNN) for detecting the cyber attack in IoT, but it was unable to defy other attacks. To handle other attacks, Sabaresan Venugopal et al. [18] developed Sunflower Jaya Optimization-based Deep stacked autoencoder (SFJO-based Deep SAE) for detecting the cyber attacks. The Deep SAE training was done using SFJO, which was devised by blending the control parameters of Jaya optimization to Sunflower optimization algorithm, but it suffered from elevated computational complexity. To reduce computational complexity, Abbas Karimi et al. [19] developed pseudo-label technique for optimizing the neural network to identify cyber attacks. Database was split into two modules, namely unlabelled and labeled, and then trained module was utilized for evaluating the labeling of unlabelled samples with pseudo-labels, but this technique did not examine images and text databases.

3 System Model

Due to the emerging technologies, new issues raises and one of them are cyber attacks. The hacks on web or the viruses spreading are heard by each of us and these bring attacks in IoT. Moreover, the extensive usage of IoT raises cyber security problems owing to the rise in attacks. Thus, the intention is to develop an effectual cyber forensic using IoT infrastructures. Figure 1 reveals IoT structure for malware detection. The IoT [20] comprises several sensor nodes, and it is connected using a wireless network. Furthermore, the IoT model contains three kinds of nodes, named normal nodes, Cluster head (CH) and BS. The normal nodes transmit data to CH and it sends the accumulated data to BS.

Fig. 1
figure 1

IoT model

The IoT model consists of one BS \(F_{j}\), CH \(K_{l}\) and \(g\) nodes. The highest range of radio communication considering node is constantly dispersed in size of \(M_{q}\) and \(N_{q}\) meters. The best position of sink node in IoT is expressed as \(\{ 0.5M_{q} ,\,0.5N_{q} \}\). The coordinate values of \(M_{o}\) and \(N_{o}\) express each IoT node position.

3.1 Energy Model

The IoT contains large number of nodes in which each node is responsible to attain some initial energy, and is represented as \(Q_{0}\), but energy of nodes in IoT is non-rechargeable [20]. The energy dissipation with normal node is formulated as,

$$T_{emt} (\varepsilon_{y}^{x} ) = T_{ele} *O_{n} + F_{pa} *O_{n} *||\varepsilon_{y}^{x} - \varepsilon_{z}^{w} ||^{4} \,\,\,;If\,\,||\varepsilon_{y}^{x} - \varepsilon_{z}^{w} || \ge s_{0}$$
(1)
$$T_{emt} (\varepsilon_{y}^{x} ) = T_{ele} *O_{n} + T_{fs} *O_{n} *||\varepsilon_{y}^{x} - \varepsilon_{z}^{w} ||^{4} \,\,\,;If\,\,||\varepsilon_{y}^{x} - \varepsilon_{z}^{w} || < s_{0}$$
(2)
$$s_{0} = \sqrt {\frac{{T_{fs} }}{{T_{pa} }}}$$
(3)

where,\(T_{fs}\) is energy obtained in free space, \(O_{n}\) refers packet size, \(T_{pa}\) deliberates multipath fading amplification, \(||\varepsilon_{y}^{x} - \varepsilon_{z}^{w} ||\) is distance amidst CH and normal node. The electronic energy is given as,

$$T_{ele} = T_{txr} + T_{DA}$$
(4)

where,\(T_{txr}\) refers energy generated by transmitter, \(T_{DA}\) symbolize energy generated while collecting data. Whenever CH node confess \(O_{n}\) data bytes, then dissipation of energy in CH is produced, and is given as,

$$T(\varepsilon_{z}^{x} ) = T_{ele} *O_{n}$$
(5)

After data broadcast and receive with CH, the IoT node energy is renewed with \(T_{\ell + 1} (\varepsilon_{y}^{x} )\) and \(T_{\ell + 1} (\varepsilon_{z}^{x} )\) and is formulated by,

$$T_{\ell + 1} (\varepsilon_{y}^{x} ) = T_{\ell } (\varepsilon_{y}^{x} ) - T_{emt} (\varepsilon_{y}^{x} )$$
(6)
$$T_{\ell + 1} (\varepsilon_{z}^{x} ) = T_{\ell } (\varepsilon_{z}^{x} ) - T_{emt} (\varepsilon_{z}^{x} )$$
(7)

The update energy considering each node is repeated until complete nodes in network set out to dead node.

3.2 Routing Using FGSA

Considering IoT, the routing of data with best path is not an easy task and initiates energy problems, because of less battery capability. The energy crisis happens whilst transmission and these issues can be mitigated with FGSA [20]. The FGSA is generated by blending advantages of both GSA and fractional theory. By transmitting data, the exchange of data is attained by best route and it reduces node power. The update of FGSA is represented as,

$$H_{i}^{h} \left( {k + 1} \right) = XH_{i}^{h} \left( k \right) + \frac{1}{2}XH_{i}^{h} \left( {k - 1} \right) + u_{i}^{h} \left( {k + 1} \right)$$
(8)

where, \(H_{i}^{h} \left( {k + 1} \right)\) is agent \(i\) location in \(h{\text{th}}\) cluster at time \(k + 1\),\(H_{i}^{h} \left( k \right)\) signifies \(i{\text{th}}\) agent location at present iteration \(k\) and \(H_{i}^{h} \left( {k - 1} \right)\) refers agent \(i\) position in previous iteration, and \(u_{i}^{h} (k + 1)\) signifies velocity evaluated by GSA in \((k + 1)^{th}\) iteration using agent \(i\) location in \(h{\text{th}}\) cluster at time, and \(X\) is constant, such that \(0 \le X \le 1\). The obtained data through best path is given as \(J\).

4 Proposed FrMSO-Based DBN for Cyber Forensic Investigation in IoT

The huge expansion in IoT has resulted into huge data and analysis of these data through internet becomes a complex task, and thus the big data technology is utilized. The quick increase in data and hacking methods availability made the devices of IoT susceptible to cyber-attacks. The exploration Cyber forensic aimed to inspect data violations by mining data from devices through network. The goal is to devise a model for cyber forensic investigation with deep learning-based feature fusion in big data-based IoT platform. Here, the IoT network is simulated initially, and then the devices collect the information and thereafter the collected information is routed to the BS using FGSA [20]. Once routing is done, cybercrime detection are performed at the BS where the data is partitioned using an enhanced FCM (eFCM) [21]. After data partitioning, the partitioned data are then provided to the MapReduce framework in which the map reduce contains two stages, like mapper and reducer. In the mapper phase, the feature fusion is done with the help of mutual information and the DQNN [22]. On the other hand, in the reducer phase, malware detection is performed using the DBN [23]. Here, the DBN is trained with proposed FrMSO algorithm, which is newly devised by integrating the FC [24], MA [25] and the SSOA [26]. Figure 2 reveals the structure of malware detection model using proposed FrMSO-based DBN.

Fig. 2
figure 2

Configuration of cybercrime detection using FrMSO-based DBN

4.1 Data Acquisition

Assume database \(L\) having several data samples, and is expressed by,

$$L = \{ T_{1} ,T_{2} , \ldots T_{r} , \ldots ,T_{u} \}$$
(9)

where, \(u\) refers total number of data and \(T_{r}\) refers \(r{\text{th}}\) data, which is of dimension \(100000 \times 100\).

4.2 Partitioning of Data with eFCM

The partioining of dataset \(L\) is done using eFCM [21]. The aim of eFCM is to overwhelm the probable local optimum, and it enhances the computational efficiency of traditional FCM. The objective function considered for eFCM is stated as,

$$A(D) = \sum\limits_{a = 1}^{E} {\sum\limits_{b = 1}^{D} {\left( {U_{a,b} } \right)^{t} f_{a,b}^{2} } }$$
(10)

where, \(D\) refers number of clusters such that \(1 \le b \le D\), \(f_{a,b}\) symbolize distance from \(a{\text{th}}\) point to \(b{\text{th}}\) centroid, \(U_{a,b}\) signifies membership value from \(a{\text{th}}\) point to \(b{\text{th}}\) cluster and \(t\) express fuzzifier and \(E\) refers total points such that \(1 \le a \le E\).

The membership function is given by,

$$U_{a,b} = \frac{1}{{\sum\limits_{m = 1}^{D} {\frac{{f_{a,b}^{{\frac{2}{t - 1}}} }}{{f_{a,m} }}} }}$$
(11)

where, \(f_{a,m}\) symbolize distance from \(a{\text{th}}\) point to \(m{\text{th}}\) centroid, \(m\) refers cluster index.

The cluster centroid matrix is expressed by,

$$S_{b} = \frac{{\sum\limits_{a = 1}^{E} {U_{a,b}^{t} .e_{a} } }}{{\sum\limits_{a = 1}^{E} {U_{a,b}^{t} } }}$$
(12)

where, \(e_{a}\) refers \(a{\text{th}}\)data.

The steps considered in eFCM are stated below.

(i) The membership matrix is initialized such that \(R = \left[ {U_{a,b} } \right]\) with starting value \(R^{(0)}\).

(ii) Initialize centroids.

a) The point is sampled uniformly consider data as first centroid.

b) Sample next centroid using data by considering probability proportional to squared distance.

c) Continue above steps repeatedly till the count of centroids is similar to needed number of clusters.

(iii) Evalaute matrix of cluster centroid considering Eq. (11) and Eq. (12)

(iv) Evaluate \(R^{(\ell + 1)}\).

(v) If \(||R^{(\ell + 1)} - R^{(\ell )} || < \omega\), then coverage else goto step (iii).

Hence, the partitioned data obtained with input data \(T_{r}\) considering eFCM is givenby,

$$T_{r} = \left\{ {\rho_{\nu } } \right\}\,;\,1 \le \nu \le \eta$$
(13)

where,\(\eta\) is total partitioned data. Here, the partitioned data obtained in mapper-1 is of dimension \(25000 \times 100\) and mapper-2 is of dimension \(50000 \times 100\) and mapper-n is of dimension \(25000 \times 100\) EFCM

4.3 Mapper and Reducer Phase

MapReduce indicates the programming method that comprises a set of mapper and reducer. It carries out detection of cybercrime by processing data parallelly. hus, it assists to control large data using MapReduce. Here, the feature fusion is done in mapper using DQNN [22], and the cyber crime detection process is performed in reducer. The partitioned data \(\rho_{v}\) is provided to MapReduce in which partioned data is splitted into particular number, which is equal to total mappers. Figure 3 reveals MapReduce for identifying cybercrime.

Fig. 3
figure 3

MapReduce for cybercrime discovery

4.3.1 Mapper Phase

In mapper, the partitioned data \(\rho_{v}\) is adapted for performing feature fusion that includes two steps, such as features sorting and fusion of features with DQNN.

4.3.1.1 Feature Fusion

Once data partitioning is done, features are acquired through data and sorted using mutual information (MI) in accordance with number of features to be selected, and are formulated by,

$$F_{k}^{new} = \sum\limits_{i = 1}^{N} {\frac{\beta }{i}} F_{i}$$
(14)
$$i = 1 + \frac{p}{q}$$
(15)
$$q = \frac{p}{N}$$
(16)

where, \(N\) signifies number of features to be chosen such that \(i = i\,\,to\,\,q\), and \(p\) represents number of features in total, and \(\beta\) refers optimal parameter.

4.3.1.2 Generating β Using DQNN

In DQNN [22], the training data and the feature size of \(100 \times 10\) are considered. All the features are fed to DQNN for attaining the target value. Here, the training data is splitted into two classes namely class 0 and class 1. Here, the class 0 contains \(60 \times 10\) and class 1 comprises \(40 \times 10\). The mean of class 0 and class 1 is computed and modelled to \(1 \times 10\).

Meanwhile, optimum value for factor \(\beta\) is discovered with DQNN. During training process, the value of optimal parameter \(\beta\) is evaluated as,

$$\beta = MI\left( {d_{i} ,\lambda_{i} } \right)$$
(17)

Here, \(\lambda_{i}\) refers average of \(d_{i}\) that belongs to the same class, \(d_{i}\) depicts data instance, and \(MI\) symbolize MI. The MI amongst feature \(\lambda_{i}\) and target \(d_{i}\) whose joint distribution is provided as \(P(\lambda_{i} ,d_{i} )\) and is formulated by,

$$MI(\lambda_{i} ,d_{i} ) = \sum\limits_{{\lambda_{i} \in B}} {\sum\limits_{{d_{i} \in C}} {P_{{\lambda_{i} ,d_{i} }} (B,C)\log \frac{{P_{{(\lambda_{i} ,d_{i} )}} (B,C)}}{{P_{{\lambda_{i} }} (B).P_{{d_{i} }} (C)}}} }$$
(18)

where, \(P_{{\lambda_{i} ,d_{i} }} (B,C)\) signifies joint probability mass function of \(\lambda_{i}\) and \(d_{i}\), \(P_{{\lambda_{i} }}\) and \(P_{{d_{i} }}\) refers marginal probability mass function, \(\lambda_{i}\) is feature and \(d_{i}\) symbolize target. After computing MI of each feature, it selects top “m” features with elevated value of MI. The features chosen with MI are expressed as \(\vartheta\), and is given as an input to reducer which is denoted as \(\gamma\).

Figure 4 depicts the process of training adapted in discovering optimum parameter \(\beta\). Thus, class label in training is expressed as \(\alpha\) in such a way that optimum value of \(\beta\) is evaluated for each data instance with aforementioned formula.

Fig. 4
figure 4

Training procedure for evaluating \(\beta\)

4.3.1.3 Structure of DQNN

The optimum value for parameter \(\beta\) is evaluated with DQNN [22] considering total features \(\left( p \right)\) as input.

Assume quantum perceptron considering input qubits as \(R\) and output qubits as \(V\). The perceptron represents random unitary operator adapted to \(\left( {R + V} \right)\) input and output qubits in such a way that it relies on \(\left( {2^{R + V} } \right)^{2} - 1\) attributes. Thus, the inputted qubits are described using unknown mixed state \(G^{in}\) wherein the output qubits are indicated in fiducial product state \(\left| {0...0} \right\rangle_{out}\). This network represents quantum circuit that contains \(I\) hidden layers, and relies on initial state \(G^{in}\) of input and produces a mixed state \(G^{out}\) for output and is formulated by,

$$G^{out} = J_{in,hid} \left( {W\left( {G^{in} \otimes \left| {0...0} \right\rangle_{hid,out} \left\langle {0...0} \right|} \right)W^{ + } } \right)$$
(19)

where, \(W = V^{out} V^{Q} V^{Q - 1} ...V^{1}\) is quantum circuit, \(V^{\lambda }\) represent layer unitaries which comprises product of quantum perceptrons, which relies on layers \(\left( {\varphi - 1} \right)\) and \(\varphi\). A fundamental feature is that output is described by combination of positive layer series to layer transition maps \(P^{\varphi }\).

$$G^{out} = P^{out} \left( {P^{I} \left( {...P^{2} \left( {P^{1} \left( {G^{in} } \right)} \right)...} \right)} \right)$$
(20)

where,

$$P^{\varphi } \left( {H^{\varphi - 1} } \right) \equiv J_{\varphi - 1} \left( {\prod\nolimits_{{\delta = S_{\lambda } }}^{1} {W_{\delta }^{\varphi } \left( {H^{\varphi - 1} \otimes \left| {0 \ldots 0} \right\rangle_{\varphi } \left\langle {0 \ldots 0} \right|} \right)\prod\nolimits_{\delta = 1}^{{S_{\lambda } }} {W_{\delta }^{\varphi + } } } } \right)$$
(21)

Here, \(W_{\delta }^{\varphi }\) reveals \(\delta^{th}\) perceptron, which relies on layers \(\left( {\varphi - 1} \right)\) and \(\varphi\), \(S_{\lambda }\) signifies total perceptrons, which acts on layers \(\left( {\varphi - 1} \right)\) and \(\varphi\), and \(W\) refers controlled unitary. Thus, the output \(G^{out}\) acquired by network reveals optimal value for \(\beta\). Hence, the feature fusion produced from mapper-1, mapper-2 and mapper-n are \(A_{1} ,A_{2} , \ldots ,A_{n}\), which are further fed as an input to reducer.

4.3.2 Reducer Phase

Here, the detection of cyber crime is performed with FrMSO-based DBN using reducer \(\gamma\) which adapts feature fusion produced from mapper-1, mapper-2 and mapper-n are \(A_{1} ,A_{2} , \ldots ,A_{n}\), as an input to reducer. The DBN [23] training is performed with FrMSO, and is obtained by integrating the advantages of FC [24], SSOA [26] and MA [25]. The DBN and training with FrMSO is illustrated below.

4.3.2.1 Structure of DBN

The DBN [23] is a type of Deep Neural Network (DNN) that contains several layers of Multilayer Perceptrons (MLPs) and Restricted Boltzmann Machines (RBMs). It considers the selected features denoted as \(f_{t}\) as its input. RBMs comprise visible and hidden units which are linked using weighted connections. It can be utilized for solving the unsupervised learning tasks for minimizing the feature dimensionality and can be used for solving the supervised learning tasks.

The input fed to visible layer indicates features and first RBM hidden layer is modelled by,

$$\varepsilon^{1} = \left\{ {\varepsilon_{1}^{1} ,\varepsilon_{2}^{1} , \ldots ,\varepsilon_{g}^{1} , \ldots ,\varepsilon_{l}^{1} } \right\}\,;\,1 \le g \le \ell$$
(22)
$$\kappa^{1} = \left\{ {\kappa_{1}^{1} ,\kappa_{2}^{1} , \ldots ,\kappa_{e}^{1} , \ldots ,\kappa_{v}^{1} } \right\}\,;\,1 \le e \le v$$
(23)

where, \(\varepsilon_{g}^{1}\) express \(g{\text{th}}\) visible neuron contained in first RBM layer, \(\kappa_{e}^{1}\) signifies \(e{\text{th}}\) hidden neuron and \(v\) express total hidden neurons. Assume \(\rho\) express bias of visible layer and \(E\) signifies bias of hidden layer and is represented by,

$$\rho^{1} = \left\{ {\rho_{1}^{1} ,\rho_{2}^{1} , \ldots ,\rho_{g}^{1} , \ldots ,\rho_{\ell }^{1} } \right\}$$
(24)
$$\mu^{1} = \left\{ {\mu_{1}^{1} ,\mu_{2}^{1} , \ldots ,\mu_{e}^{1} , \ldots ,\mu_{v}^{1} } \right\}$$
(25)

where, \(\rho_{g}^{1}\) signifies bias associated to \(g{\text{th}}\) neuron, and \(\mu_{e}^{1}\) symbolize bias associated with \(e{\text{th}}\) hidden neuron. The weights employed in first RBM are given by,

$$\varpi^{1} = \left\{ {\varpi_{g,e}^{1} } \right\}\,;\,1 \le g \le 4\,;\,1 \le e \le v$$
(26)

where, \(\varpi_{g,e}^{1}\) express weight amidst \(g{\text{th}}\) visible neuron and \(e{\text{th}}\) hidden neuron. The output of hidden layer considering first RBM is represented by,

$$\kappa_{e}^{1} = \alpha \left[ {\mu_{e}^{1} + \sum\limits_{g} {\varepsilon_{g}^{1} \varpi_{g,e}^{1} } } \right]$$
(27)

where, \(\alpha\) symbolize activation function. The output produced through first RBM are given by,

$$\kappa^{1} = \left\{ {\kappa_{e}^{1} } \right\}\,;\,1 \le e \le v$$
(28)

The count of visible neurons is expressed by,

$$\varepsilon^{2} = \left\{ {\varepsilon_{1}^{2} ,\varepsilon_{2}^{2} , \ldots ,\varepsilon_{\ell }^{2} } \right\}\, = \left\{ {\kappa_{e}^{1} } \right\}\,;\,1 \le e \le v$$
(29)

where, \(\left\{ {\kappa_{e}^{1} } \right\}\) refers first RBM layer output.

The hidden layer considering second RBM is formulated by,

$$\kappa^{2} = \left\{ {\kappa_{1}^{2} ,\kappa_{2}^{2} , \ldots ,\kappa_{e}^{2} , \ldots ,\kappa_{v}^{2} } \right\}\,;\,1 \le e \le v$$
(30)

The biases of visible layer and hidden layer considering second RBM layer is expressed in Eqs. (24) and (25),denoted as \(\rho^{2}\) and \(\mu^{2}\). The weight vector of second RBM is modelledby,

$$\varpi^{2} = \left\{ {\varpi_{ee}^{2} } \right\}\,;\,1 \le e \le v$$
(31)

where,\(\varpi_{ee}^{2}\) refers weight amidst \(e{\text{th}}\) neuron, and \(e{\text{th}}\) hidden neuron. The output of \(e{\text{th}}\) hidden neuron is expressedby,

$$\kappa_{e}^{2} = \alpha \left[ {\mu_{e}^{2} + \sum\limits_{g} {\varepsilon_{g}^{2} \varpi_{ee}^{2} } } \right]\forall \varepsilon_{g}^{2} = \kappa_{e}^{1}$$
(32)

where,\(y_{u}^{2}\) refers bias linked with \(u{\text{th}}\) hidden neuron. Hence, the output of hidden layer generated is modelled as,

$$\kappa^{2} = \left\{ {\kappa_{e}^{2} } \right\}\,;\,1 \le e \le v$$
(33)

The MLP input is modelled by,

$$Y = \left\{ {Y_{1} ,Y_{2} , \ldots ,Y_{e} , \ldots ,Y_{v} } \right\}\, = \left\{ {\kappa_{e}^{2} } \right\}\,;\,1 \le e \le v$$
(34)

where,\(v\) refers count of neurons in input layer.

The hidden layer of MLP is given by,

$$Z = \left\{ {Z_{1} ,Z_{2} , \ldots ,Z_{N} , \ldots ,Z_{o} } \right\}\,;\,1 \le N \le o$$
(35)

where,\(o\) signifies total hidden neurons. The MLP output is modelled by,

$$O = \left\{ {O_{1} ,O_{2} , \ldots ,O_{i} , \ldots ,O_{j} } \right\}\,;\,1 \le i \le j$$
(36)

where,\(j\) refers count of neurons. Consider \(\omega^{^{\prime}}\) indicates a weight vector, and is formulated by,

$$\varpi^{^{\prime}} = \left\{ {\varpi_{eN}^{^{\prime}} } \right\}\,;\,1 \le e \le v\,;\,1 \le N \le O$$
(37)

where,\(\omega_{eN}^{^{\prime}}\) refers weight amidst \(e{\text{th}}\) input neuron, and \(N{\text{th}}\) hidden neuron. The hidden layer output is expressed by,

$$Z_{N} = \left[ {\sum\limits_{e = 1}^{v} {\varpi_{eN}^{^{\prime}} * I_{e} } } \right]L_{N} \forall Y_{e} = \kappa_{e}^{2}$$
(38)

where,\(L_{N}\) refers bias of hidden neuron. The weights amidst hidden layer and output layer is expressed as \(\varpi^{^{\prime\prime}}\), and is modelled as,

$$\varpi^{^{\prime\prime}} = \left\{ {\varpi_{ei}^{^{\prime\prime}} } \right\}\,;\,1 \le e \le v\,;\,1 \le i \le j$$
(39)

Hence, the output vector with weight \(\varpi^{^{\prime\prime}}\) and hidden layer output is given by,

$$O_{i} = \sum\limits_{N = 1}^{O} {\varpi_{eN}^{^{\prime\prime}} * Z_{N} }$$
(40)

where,\(\varpi^{\prime\prime\prime}_{eN}\) signifies weight amidst \(e{\text{th}}\) hidden neuron, and \(N{\text{th}}\) output neuron and \(Z_{N}\) signifies hidden layer output. The DBN output is given by \(\upsilon .\)

4.3.2.2 DBN Training with FrMSO

The DBN [23] training is performed with FrMSO, and is produced by integrating the advantages of FC, and MSSO. The MSSO is obtained by acquiring the benefits of MA [25] and SSOA [26]. The MSSO helps to effectively balance exploration and exploitation and helps to run off from local optimum. It can effectively manage engineering design issues and optimization issues. On the other hand, FC [24] can effectively deal with infinite parameters and poses the ability to solve differential and integral equation. Hence, the hybridization of MSSO and FC aids to produce global optimum solution. The FrMSO steps are enlisted below.

i) Initialization:

The foremost step is initialization, and is formulated as,

$$D = \{ D_{1} ,D_{2} , \ldots ,D_{\tau } , \ldots ,D_{\hbar } \}$$
(41)

where, \(\hbar\) refers total solutions, and \(D_{\tau }\) is \(\tau^{th}\) solution.

ii) Find error:

The error is discovered to acquire most excellent solution and is produced by,

$$MSE = \frac{1}{o}\,\left[ {\sum\limits_{z = 1}^{o} {\ell_{z} - \upsilon } } \right]$$
(42)

where, \(\ell_{z}\) is output calculated and \(\upsilon\) refers DBN output, and \(o\) symbolize total data.

iii) Update equation of proposed FrMSO

The update of MSSO is given by,

$$D_{a} (y + 1) = \left[ \begin{gathered} C_{a} (y) + b_{1} e^{{ - \beta r_{p}^{2} }} pb_{a} + b_{2} e^{{ - \beta r_{g}^{2} }} gb_{a} \hfill \\ - \frac{{rand\left( {R \circ D_{g} (y) + Y \circ D_{o} (y)} \right)}}{{\left[ {1 - R \times rand - Y \times rand} \right]}}\left[ {1 - b_{1} e^{{ - \beta r_{p}^{2} }} - b_{2} e^{{ - \beta r_{g}^{2} }} } \right] \hfill \\ \end{gathered} \right]*\frac{{\left[ {1 - R \times rand - Y \times rand} \right]}}{{b_{1} e^{{ - \beta r_{p}^{2} }} + b_{2} e^{{ - \beta r_{g}^{2} }} - R \times rand - Y \times rand}}$$
(43)

where, \(R\) and \(Y\) is parameter, \(rand\) signifies random number between (0,1), \(b_{1} ,b_{2}\) is positive attraction constants,\(C_{a} (y)\) refers velocity of \(a^{th}\) mayfly at iteration \(y\),\(pb_{a}\) is personal best solution of \(a^{th}\) mayfly, and \(gb_{a}\) signifies global best solution of \(a{\text{th}}\) mayfly, \(D_{g} (y)\) and \(D_{o} (y)\) is solution vectors at iteration \((y + 1)\), \(e\) refers nuptial dance coefficient,

Subtract \(D_{a} (y)\) on both sides,

$$\begin{gathered} D_{a} (y + 1) - D_{a} (y) = \left[ \begin{gathered} C_{a} (y) + b_{1} e^{{ - \beta r_{p}^{2} }} pb_{a} + b_{2} e^{{ - \beta r_{g}^{2} }} gb_{a} \hfill \\ - \frac{{rand\left( {R \circ D_{g} (y) + Y \circ D_{o} (y)} \right)}}{{\left[ {1 - R \times rand - Y \times rand} \right]}}\left[ {1 - b_{1} e^{{ - \beta r_{p}^{2} }} - b_{2} e^{{ - \beta r_{g}^{2} }} } \right] \hfill \\ \end{gathered} \right] \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,*\frac{{\left[ {1 - R \times rand - Y \times rand} \right]}}{{b_{1} e^{{ - \beta r_{p}^{2} }} + b_{2} e^{{ - \beta r_{g}^{2} }} - R \times rand - Y \times rand}} - D_{a} (y) \hfill \\ \end{gathered}$$
(44)
$$\begin{gathered} D_{a} (y + 1) - D_{a} (y) = \left[ \begin{gathered} C_{a} (y) + b_{1} e^{{ - \beta r_{p}^{2} }} pb_{a} + b_{2} e^{{ - \beta r_{g}^{2} }} gb_{a} \hfill \\ - \frac{{rand\left( {R \circ D_{g} (y) + Y \circ D_{o} (y)} \right)}}{{\left[ {1 - R \times rand - Y \times rand} \right]}}\left[ {1 - b_{1} e^{{ - \beta r_{p}^{2} }} - b_{2} e^{{ - \beta r_{g}^{2} }} } \right] \hfill \\ \end{gathered} \right] \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,*\frac{{\left[ {1 - R \times rand - Y \times rand} \right]}}{{b_{1} e^{{ - \beta r_{p}^{2} }} + b_{2} e^{{ - \beta r_{g}^{2} }} - R \times rand - Y \times rand}} - D_{a} (y) \hfill \\ \end{gathered}$$
(45)

For dealing with infinite terms, the FC concept is utilized. As per FC [24], the equation is written as,

$$\begin{gathered} M^{\alpha } \left( {D_{a} (y + 1)} \right) = \left[ \begin{gathered} C_{a} (y) + b_{1} e^{{ - \beta r_{p}^{2} }} pb_{a} + b_{2} e^{{ - \beta r_{g}^{2} }} gb_{a} \hfill \\ - \frac{{rand\left( {R \circ D_{g} (y) + Y \circ D_{o} (y)} \right)}}{{\left[ {1 - R \times rand - Y \times rand} \right]}}\left[ {1 - b_{1} e^{{ - \beta r_{p}^{2} }} - b_{2} e^{{ - \beta r_{g}^{2} }} } \right] \hfill \\ \end{gathered} \right] \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,*\frac{{\left[ {1 - R \times rand - Y \times rand} \right]}}{{b_{1} e^{{ - \beta r_{p}^{2} }} + b_{2} e^{{ - \beta r_{g}^{2} }} - R \times rand - Y \times rand}} - D_{a} (y) \hfill \\ \end{gathered}$$
(46)
$$\begin{gathered} D_{a} (y + 1) - \alpha D_{a} (y) - \frac{1}{2}\alpha D_{a} (y - 1) - \frac{1}{6}(1 - \alpha )D_{a} (y - 2) - \frac{1}{24}\alpha (1 - \alpha )(2 - \alpha )D_{a} (y - 3) = \hfill \\ \left[ \begin{gathered} C_{a} (y) + b_{1} e^{{ - \beta r_{p}^{2} }} pb_{a} + b_{2} e^{{ - \beta r_{g}^{2} }} gb_{a} \hfill \\ - \frac{{rand\left( {R \circ D_{g} (y) + Y \circ D_{o} (y)} \right)}}{{\left[ {1 - R \times rand - Y \times rand} \right]}}\left[ {1 - b_{1} e^{{ - \beta r_{p}^{2} }} - b_{2} e^{{ - \beta r_{g}^{2} }} } \right] \hfill \\ \end{gathered} \right]\,*\frac{{\left[ {1 - R \times rand - Y \times rand} \right]}}{{b_{1} e^{{ - \beta r_{p}^{2} }} + b_{2} e^{{ - \beta r_{g}^{2} }} - R \times rand - Y \times rand}} - D_{a} (y) \hfill \\ \end{gathered}$$
(47)
$$\begin{gathered} D_{a} (y + 1) = \alpha D_{a} (y) + \frac{1}{2}\alpha D_{a} (y - 1) + \frac{1}{6}(1 - \alpha )D_{a} (y - 2) + \frac{1}{24}\alpha (1 - \alpha )(2 - \alpha )D_{a} (y - 3) + \hfill \\ \left[ \begin{gathered} C_{a} (y) + b_{1} e^{{ - \beta r_{p}^{2} }} pb_{a} + b_{2} e^{{ - \beta r_{g}^{2} }} gb_{a} \hfill \\ - \frac{{rand\left( {R \circ D_{g} (y) + Y \circ D_{o} (y)} \right)}}{{\left[ {1 - R \times rand - Y \times rand} \right]}}\left[ {1 - b_{1} e^{{ - \beta r_{p}^{2} }} - b_{2} e^{{ - \beta r_{g}^{2} }} } \right] \hfill \\ \end{gathered} \right]\,*\frac{{\left[ {1 - R \times rand - Y \times rand} \right]}}{{b_{1} e^{{ - \beta r_{p}^{2} }} + b_{2} e^{{ - \beta r_{g}^{2} }} - R \times rand - Y \times rand}} - D_{a} (y) \hfill \\ \end{gathered}$$
(48)

The update of FrMSO is provided as,

$$\begin{gathered} D_{a} (y + 1) = \left[ \begin{gathered} C_{a} (y) + b_{1} e^{{ - \beta r_{p}^{2} }} pb_{a} + b_{2} e^{{ - \beta r_{g}^{2} }} gb_{a} \hfill \\ - \frac{{rand\left( {R \circ D_{g} (y) - Y \circ D_{o} (y)} \right)}}{{\left[ {1 - R \times rand - Y \times rand} \right]}}\left[ {1 - b_{1} e^{{ - \beta r_{p}^{2} }} - b_{2} e^{{ - \beta r_{g}^{2} }} } \right] \hfill \\ \end{gathered} \right]\,*\frac{{\left[ {1 - R \times rand - Y \times rand} \right]}}{{b_{1} e^{{ - \beta r_{p}^{2} }} + b_{2} e^{{ - \beta r_{g}^{2} }} - R \times rand - Y \times rand}} \hfill \\ - D_{a} (y)(1 - \alpha ) + \frac{1}{2}\alpha D_{a} (y - 1) + \frac{1}{6}(1 - \alpha )D_{a} (y - 2) + \frac{1}{24}\alpha (1 - \alpha )(2 - \alpha )D_{a} (y - 3) \hfill \\ \end{gathered}$$
(49)

iv) Check feasibility:

The error is discovered and solution producing less error is chosen as optimum solution.

v) Termination:

The aforesaid steps are repeated until optimum solution is acquired. Table 1 inspects pseudo code of FrMSO.

Table 1 Pseudo code of FrMSO

5 Results and Discussion

The competence of FrMSO + DBN is obtained by altering data considered for training.

5.1 Experimental Set-Up

The modeling of FrMSO + DBN is operated in Python with PC having Windows 10 OS, Intel i3 core processor and 8 GB RAM.

5.2 Dataset Description

The assessment is done considering UCSD Network Telescope Aggregated DDoS Metadata [27]. This dataset represents the tasks of DDoS and is observed through the UCSD Network Telescope. It is accumulated with raw Telescope data considering the criterions described in Internet DoS based tasks.

5.3 Performance Analysis

Figure 5 provides valuation of FrMSO + DBN by changing the data taken for training and is modelled using specific measures. The precision investigation is represented in Fig. 5a. For 60% data, the precision deliberated by FrMSO + DBN with iteration 5, 10, 15, 20 are 0.843, 0.848, 0.852, and 0.856. Correspondingly, for 90% data, the precision deliberated by FrMSO + DBN with iteration 5, 10, 15, 20 are 0.953, 0.957, 0.960, and 0.964. The recall inspection is demonstrated in Fig. 5b. For 60% data, the recall deliberated by FrMSO + DBN with iteration 5, 10, 15, 20 are 0.872, 0.877, 0.881, and 0.884. Correspondingly, for 90% data, the recall deliberated by FrMSO + DBN with iteration 5, 10, 15, 20 are 0.971, 0.975, 0.978, and 0.983. The F-measure inspection is demonstrated in Fig. 5c. For 60% data, the F-measure deliberated by FrMSO + DBN with iteration 5, 10, 15, 20 are 0.807, 0.810, 0.814, and 0.819. Correspondingly, for 90% data, the F-measure deliberated by FrMSO + DBN with iteration 5, 10, 15, 20 are 0.941, 0.945, 0.950, and 0.954.

Fig. 5
figure 5

Assessment of FrMSO + DBN considering a Precision b Recall c F-measure

5.4 Algorithm Analysis

The estimation considering DBN is defined in Fig. 6. The precision examination is illustrated in Fig. 6a. With population size = 5, the precision acquired by ChoA + DBN is 0.907, MA + DBN is 0.913, SSOA + DBN is 0.917, MSSO + DNFN is 0.922 and FrMSO + DBN is 0.927. Equally, for population size = 20, the precision acquired by ChoA + DBN is 0.919, MA + DBN is 0.923, SSOA + DBN is 0.929, MSSO + DNFN is 0.933 and FrMSO + DBN is 0.938. The efficiency of ChoA + DBN, MA + DBN, SSOA + DBN, MSSO + DBN with FrMSO + DBN considering precision is 2.025%, 1.599%, 0.959%, 0.533%. The recall inspection is rendered in Fig. 6b. For population size = 5, the recall acquired by ChoA + DBN is 0.926, MA + DBN is 0.932, SSOA + DBN is 0.936, MSSO + DNFN is 0.941 and FrMSO + DBN is 0.947. Equally, considering population size = 20, the recall acquired by ChoA + DBN is 0.942, MA + DBN is 0.945, SSOA + DBN is 0.950, MSSO + DNFN is 0.954 and FrMSO + DBN is 0.960. The efficiency of ChoA + DBN, MA + DBN, SSOA + DBN, MSSO + DBN with FrMSO + DBN using recall is 1.875%, 1.562%, 1.041%, 0.625%. The F-measure examination is rendered in Fig. 6c. Considering population size = 5, the F-measure acquired by ChoA + DBN is 0.902, MA + DBN is 0.908, SSOA + DBN is 0.911, MSSO + DNFN is0.917 and FrMSO + DBN is 0.920. Equally, considering population size = 20, the F-measure acquired by ChoA + DBN is 0.914, MA + DBN is 0.918, SSOA + DBN is 0.924, MSSO + DNFN is 0.929 and FrMSO + DBN is 0.934. The efficiency of ChoA + DBN, MA + DBN, SSOA + DBN, MSSO + DBN with FrMSO + DBN using F-measure is 2.141%, 1.713%, 1.070%, 0.535%.

Fig. 6
figure 6

Algorithms analysis with DBN considering a Precision b Recall c F-measure

5.5 Comparative Assessment

The techniques considered for assessment involves Cyber forensics framework [16], HDRaNN [17], SFJO + Deep SAE [18], NN [19], MSSO + DNFN, and proposed FrMSO + DBN.

The techniques assessment is examined in Fig. 7. The precision investigation is shown in Fig. 7a. Using 60% data, the precision attained by existing are 0.756, 0.770, 0.782, 0.806, and 0.829 whereas FrMSO + DBN is 0.856. Besides, for 90% data, the precision attained by existing are 0.867, 0.883, 0.895, 0.913, and 0.933 whereas FrMSO + DBN is 0.964. The competence of existing with FrMSO + DBN considering precision is 10.062%, 8.402%, 7.157%, 5.290%, 3.215%. The recall investigation is shown in Fig. 7b. For 60% data, the recall attained by existing are 0.697, 0.710, 0.725, 0.747, 0.867 whereas FrMSO + DBN is 0.884. Also, for 90% data, the recall attained by existing are 0.817, 0.839, 0.869, 0.876, 0.957, whereas FrMSO + DBN is 0.983. The competence of existing with FrMSO + DBN considering recall is 16.887%, 14.649%, 11.597%, 10.885%, 2.644%. The F-measure investigation is shown in Fig. 7c. For 60% data, the F-measure attained by existing are 0.700, 0.720, 0.733, 0.763, 0.786, whereas FrMSO + DBN is 0.819. Also, for 90% data, the F-measure attained by existing are 0.852, 0.878, 0.889, 0.901, 0.921 whereas FrMSO + DBN is 0.954. The competence of existing with FrMSO + DBN considering F-measure is 10.691%, 7.966%, 6.813%, 5.555%, and 3.459%.

Fig. 7
figure 7

Assessment of techniques with a Precision b Recall c F-measure

6 Conclusion

This paper presented a novel cyber forensics model, namely FrMSO-based DBN for identifying the cyber-attacks and traced them in IoT. It is due to the fact that the security aspects of IoT is in rise as an industry and public employs novel technologies with huge number of streaming data. Here, the network examination process is detailed and the proposed FrMSO-based DBN are briefly described. The total processing is performed with MapReduce framework wherein the mappers perform feature selection with mutual information and DQNN while the reducer performs cyber attack detection with proposed FrMSO-based DBN. Here, the FrMSO are utilized for adapting the best weights of DBN. In addition, the proposed model is validated with the datasets and certain evaluation measures for revealing the efficiency of proposed model. The method presented high precision with significant time-saving and provided better balance amidst the efficiency and detection time. The proposed FrMSO-based DBN give better performance with elevated precision of 96.4%, recall of 98.3% and F-measure of 95.4% respectively. The future works includes execution of this technique in real platform for preventing the cyber attacks and consider other databases to validate reliability of proposed technique.