Keywords

1 Introduction

In the past few years, as a result of developments in the field of electronics and improvements of wireless systems, the term Internet of Things (IoT) emerged. The opportunity to connect devices together and share information and data while performing their individual tasks without being bound to locations and physical equipment [1].

Industrial Internet of Things (IIoT) is a new application of the Internet of Things (IoT) in the industrial sector. The IIoT enables an enterprise to perform operations in an efficient way while maintaining quality and validation [2]. IIoT makes monitoring and maintenance tasks more convenient, which will be discussed under the category of smart manufacturing systems [3]. By integrating Cyber-Physical Systems CPS, a smart manufacturing execution system can be created such that, documents all data obtained from production and performs decision-making based on predictions on the data for better and optimized future steps [4]. IoT has been progressively used in different sectors of the industry and created a new revolution, IIoT or Industry 4.0 [5, 6], which improves the efficiency, security and productivity in the industry [7,8,9,10]. Based on the environment and the purpose of its application, IIoT can have different architectures, but generally, it can be described in a four-layered architecture, as can be observed in Fig. 1.

Fig. 1
figure 1

Four layered architecture IIoT

The physical layer, consists of all physical elements such as actuators, sensors, machines, etc. The network layer consists of communication networks and protocols. The middleware layer makes the communication between the second layer and the first layer possible. It consists of an application programming interface (API), database, cloud server, etc. The fourth layer, the application layer, describes the application of the IIoT. Some instances of the applications are autonomous vehicles, smart home, healthcare, etc. [11,12,13,14,15]. for instance, Rouzbahani et al. proposed an Incentive-based Demand Response Optimization (IDRO) model in order to efficiently schedule household appliances for minimum usage during peak hours [16, 17], which demonstrates noticeable improvements in power factor and cost-saving during peak hours for individual households.

While IIoT is an excellent solution to facilitate industrial processes, it creates new challenges with its application. As the devices start to operate simultaneously, they generate valuable data for online monitoring and control of the system, which can also be used by attacker to manipulate the system performace [18, 19]. There are several attacks that can be performed in IIoT, one of such is cyber-attacks, and this type of attack has other variations itself, such as Denial of Service (DoS), Datatype Probing (DP), Scan, and etc.

In order to perform data processing and analysis, Machine Learning (ML) is preferred over the traditional methods due to the huge quantity of data that is being generated throughout the operations. ML is considered to be a useful paradigm for detecting security threats [20]. Apruzzese et al. [21] conducted an investigation on the effectiveness of ML for cyber threat detection to find and address the limitation of it in such tasks. Lee et al. [22] conducted the same topic of research, but the focus was on the reduction of error in the solution. These are a few instances to show the effectiveness of ML in the area of detection and classification of cyber-security threats.

In this paper, we proposed a Snapshot Ensemble Deep Neural Network (SEDNN) for cyber-attack detection. The model has high accuracy in the detection of cyber threats. It is worth noticing that the classification of the attacks was not considered in this paper. In sect. II, some previous work on the same area of study will be presented, section III will be devoted to the methodology, in sect. IV the results will be presented and in sect. V conclusion and future steps will be discussed.

2 Previous Works in IIoT Security

As the system becomes more complex and data quantity becomes enormous [23,24,25], the computation and control become more challenging, resulting in traditional methods not to perform as expected because of latency and long response time [26, 27]. ML algorithms improve industrial processes’ security and reliability and are rapidly used to detect and address security threats in IIoT [28, 29]. Previous studies in the area of ML application in IIoT security show promising results in using ML algorithms for addressing cyber threats in IIoT.

Rouzbahani, Karimipour and Lei [30] proposed an Ensemble Deep Convolutional Neural Network (EDCNN) model for electricity theft detection in smart grids. In this study, they used a dataset consisting of the daily consumption of 42,372 users. They used an unbalanced dataset in which 8% of customers were attackers, and the rest were normal users. They compared the results with other models and concluded that EDCNN could detect electricity theft in smart grids with an accuracy of 0.981, which indicates that the model is precise.

Farahnakian and Heikkonen [31] approached intrusion detection by presenting a Deep Auto-Encoder (DAE) based system. They used the model on the KDD-CUP’99 dataset and achieved an accuracy of 94.71% for attack detection, which then they concluded that their approach obtained better results as opposed to other deep learning-based approaches. Moukhafi et al. [32] chose a novel hybrid genetic algorithm and support vector machine with the particle swarm optimization feature selection approach for detecting Denial of Service (DoS) attack detection, which they implemented on KDD 99 dataset and obtained an accuracy of 96.38%. Rouzbahani et al. [33] presented research on using ML algorithms for the classification of False Data Injection (FDI) attacks in CPS.

Vajayanand et al. [34] proposed a support vector machine (SVM)-based model, and by doing so, they improved the classification algorithm. They used the ADFA-LD dataset for the implementation of their model and obtained an accuracy of 94.51%. In the research of Khalvati et al. [35], they proposed the SVM and Bayesian model to successfully classify IoT attacks. They conducted research with their proposed model on KDD CUP 99 dataset and achieved an accuracy of 91.50%. Li et al. [36] proposed a bidirectional long and short-term memory network with a multi-feature layer (B-MLSTM) on the classical IIoT datasets: CTU-13 [37], Gas-Water [38], and AWID [39] in order to detect low-frequency and multi-stage attacks in IIoT. After the implementation of the model, an accuracy of 95.01% on the CTU-13 and 97.58% on AWID was obtained. Rouzbahani et al. [40] conducted research and performed cyber-attack detection in smart cyber-physical grids by using different ML algorithms, which resulted in a great performance for Random forest K-Nearest Neighbor (KNN).

Overall, investigations show that ML can efficiently and precisely detect security threats in IIoT. What is worth noticing is that the datasets in these studies are classical datasets that are available on the internet and are considered to be outdated. We are obligated to use new datasets because of the modern security requirements of IIoT. This paper proposes a modern ML model that will be implemented on newer datasets and will also address the compatibility of the model with resource-constrained devices.

3 Methodology

In this section, a brief description of the dataset has been presented. The section will then continue with a description of the preprocessing of the dataset, the proposed model, and evaluation parameters that were considered to evaluate the model’s performance.

3.1 Dataset

The dataset used in this paper is an open-source dataset obtained from Kaggle [41]. It was provided by Pahl et al. [42]. This dataset contains communications between different IoT nodes, sensors and applications. In this dataset, multiple attacks were performed on the IIoT applications, for example, “spying”, “wrong setup” and etc., which resulted in an anomaly in some of the 357,952 data samples [43, 44]. This paper tried to address the cyber-attack performed on the data. Classification of the attacks will be discussed in another paper.

3.2 Preprocessing of Data

In order to obtain acceptable results from ML models, a comprehensive dataset is the main requirement. Most of the time in data mining is devoted to data processing [45], and the most essential problem in data processing is missing values, which can be caused by various reasons such as power outage, sensor damage or cyber-attacks [46].

In this dataset, there are missing values. Deleting them can result in losing valuable data on other columns. Therefore, the missing values need to be replaced. Figure 2 shows a diagram of the algorithm for attack detection. The processing of replacing the missing values is as follows:

Fig. 2
figure 2

Diagram of the attack detection algorithm

3.2.1 Features

First, we need to select the features that we want to create our model based on. Table 1 shows the features that were selected. It demonstrates which methods were considered in order to encode the features as well.

Table 1 Methods for feature encoding

3.2.2 Replacing Missing/NaN Values

Backward Difference Encoding: this coding system is one of the coding systems of categorical encoding. When a regression is performed on a set of variables with K categories, these variables will enter the regression as a sequence of K-1 dummy variables. The regression coefficient of these K-1 variables corresponds to linear hypotheses on the cell means.

In this coding system, the mean of the dependent variable for one level of the categorical variable is compared to the mean of the dependent variable for the prior adjacent level.

Label Encoding: in this encoding, a number will be assigned to each variable. The model should be able to understand the difference between “blank,” “False,” and “None” variables. Therefore we cannot assign 0 to all of them. Table 2 demonstrates the values which were assigned to each variable.

Table 2 Replacing missing values

3.3 Snapshot Ensemble Deep Neural Network

In this paper, a Snapshot Ensemble Deep Neural Network (SEDNN) was proposed in order to detect cyber-attacks on the dataset. The disadvantage of an ordinary Ensemble Deep Neural Network (EDNN) is a high computational cost, so that with ordinary hardware, the time of the training and testing will be high. In order to overcome this problem, this paper approached this problem with an SEDNN model [47]. The difference between and ordinary EDNN and SEDNN is that every time the SEDNN reaches a local minimum, it will save the model’s weights and biases and continues to do so until the model finds the optimal minimum, resulting in a set of neural networks with low errors. After this process, the model will ensemble all models in this set and obtains the perfect model. The algorithm uses Gradient Descent in order to find the minimum in each step. Two types of activation functions were considered for the DNN layers, for the first three layers, a “Relu” activation function was assigned, and for the last layer, a “Sigmoid” function was considered to conduct a binary classification in this paper. As an output, each of the test set data will be given a label of 0 (Normal) or 1 (Attack). Figure 3 shows a visualization of the proposed algorithm, and the architecture of the DNNs can be observed in Fig. 4.

Fig. 3
figure 3

Architecture of The Proposed Algorithm

Fig. 4
figure 4

Deep Neural Network Architecture

3.4 Evaluation Parameters

In order to evaluate a ML model, there are some parameters that can be used. In this section, these parameters will be briefly explained. There are some terms used in the calculation of the evaluation parameters that need to be defined.

True positive is the resulting term where the model correctly predicted the positive class. True negative is the resulting term where the model correctly predicted the negative class. False-positive is the resulting term where the model incorrectly predicted the positive class. False-negative is the resulting term where the model incorrectly predicted the negative class.

Accuracy is the most common measure for evaluating the ML model, and it is defined as the ratio of correctly predicted results to the total predicted results. It may be implied that the higher the accuracy, the more precise model. This is not true in all possible cases. This assumption is only correct when there are symmetric datasets where false positives and false negatives are almost the same. Therefore, we have to look for other parameters to evaluate our model more accurately. The mathematical formula for accuracy calculation is described in Eq. 1.

$$ Accuracy=\frac{T_{Pos}+{T}_{Neg}}{T_{Pos}+{T}_{Neg}+{F}_{Pos}+{F}_{Neg}} $$
(1)

Precision is the ratio of true positives to all optimistic predictions. The formula for precision calculation is described in Eq. 2. High precision will result in low false-positive rate.

$$ Precision=\frac{T_{Pos}}{T_{Pos}+{F}_{Pos}} $$
(2)

The recall is the ratio between true positive to all predictions (true positive and false negative) of the same class. The formula for recall calculation is described in Eq. 3.

$$ Recall=\frac{T_{Pos}}{T_{Pos}+{F}_{Neg}} $$
(3)

F1-Score is the weighted average of Precision and Recall. Therefore, it takes false positives and false negatives into account. The formula for F1-score calculation is described in Eq. 4.

$$ F1- Score=\frac{2\times \left( Precision+ Recall\right)}{Precision+ Recall} $$
(4)

4 Implementation and Results

In this section, hardware and software equipment will be discussed. The section will continue to present the results in detail.

4.1 Software and Hardware

The proposed model has been tested using Python 3.7.4 on a system with an Intel Core i7-97580H CPU, 16.0 GB of RAM, and the model’s design is structured based on TensorFlow. In order to analyze the performance of the model, we need to obtain the confusion matrix, which will offer us true positive, false positive, true negative and false negative.

4.2 Results

The general form of a confusion matrix can be observed in Table 3.

Table 3 Confusion matrix

In this research, different classifiers have been tested on the dataset in order to compare the results and accuracy percentage. In Table 4, the confusion matrix of the proposed model can be observed; moreover, Table 5 presents the proposed model’s performance with evaluation parameters.

Table 4 Confusion matrix of the proposed model
Table 5 Result comparison of different classifiers

As it can be implied from Tables 4 and 5, the model presents promising results. Obtained accuracy of 90.58% and F1-Score of 90.48% show the great performance of SEDNN in detecting cyber-attacks in IIoT applications. Figures 5 and 6 show the accuracy and loss rate of the model.

Fig. 5
figure 5

SEDNN accuracy rate

Fig. 6
figure 6

SEDNN loss rate

In Fig. 5, the accuracy is not stable, and this is caused by changes of DNN between each time it reaches a local minimum, the algorithm uses a new DNN with new weights and biases. Overall, the test set’s accuracy is higher than the train set, which shows the model’s outstanding performance.

In Fig. 6, we can observe the loss diagram of the train and test set. The nose in the test diagram was caused by utilizing multiple DNN in between each local minimum, as was described before. It can be observed that overall, the loss of the test set is lower than the train set, which shows the model is performing great.

5 Conclusion and Future Work

In this paper, a SEDNN model was proposed for cyber-attack detection in industrial IoT systems. As the model searches for a global minimum, upon finding every local minimum, it will save the weights and biases of that particular DNN (Snapshots), and when it reaches the global minima, it generates the best possible model from the set of DNNs, instead of training and testing different models on the entire dataset. The proposed model has a high accuracy of 90.58%, demonstrating the model’s excellent performance in cyber-attack detection. The model was tested on an open-source dataset, DS2OS, which showed promising results. The dataset consists of communication between different IoT nodes such as sensors and actuators. In the future steps, more real-time experiments and ìnvestigations can be conducted with the proposed model to test the model on real IIoT systems; furthermore classification of the attacks with the proposed model will be conducted in future researches.