Keywords

1 Introduction

The Internet of Things (IoT) is a rapidly evolving technology that uses networking to connect infrastructure, digital devices, physical objects, applications, and persons [1]. The Internet of Medical Things (IoMT) is a use of the Internet of Things (IoT) in the health care sector [2, 3]. It is undeniable that smart medical gadgets have made life simpler and healthier for many people. However, security and privacy issues in the IoMT system remain unsolved issue [4, 5]. Hence, cyber-attack detection systems are considered as a defensive layer for the IoMT devices and networks. Machine learning and deep learning have been employed for intrusion and cyber-attack detection for the IoMT system. Solutions include on gadgets embedded models to cloud based systems. However, the chips and gadgets are not much efficient to hold the models and the IoMT network data. Additionally, cloud-based systems are centralized, and their detection is associated with delay. Hence, new approaches of network cyber-attack detection is required to overcome those limitations.

Fog computing is a novel concept that was developed to address the cloud's latency, centralization, and privacy problems [6]. Some cloud computing responsibilities will be moved closer to the smart devices in fog oriented IoT [7]. Moreover, a fog node might serve as the initial defense line for small devices that lack security features [8]. Fog-based attack detection is not widely used, especially for the IoMT system. Few studies have proposed a fog-based detection system. The authors in [9] presented a distributed Intrusion Detection System (IDS) which works based on fog-computing principle. Their system is designed for smart medical system that uses an online method specifically sequential extreme learning machine (EOS-ELM). They demonstrated that their proposed system is superior to cloud-based systems regarding detection time and true positive rate. In another study [10], the authors have used an ensemble learning for binary network cyber-attack detection using ensemble of (Decision Tree, Naive Bayes, and Random Forest) and XGBoost classifiers in the IoMT system following fog-cloud architecture. Because their system is too heavy for fog devices, they recommend using cloud computing for training and fog computing for testing. Another study [11] employs an ensemble incremental learning technique for fog devices for network intrusion detection in medical IoT networks. However, the dataset utilized isn't a recent IoT. Then, based on the current research gaps, a fog-based attack detection system is proposed using incremental ensemble learning for the IoMT system. The proposed system has two-folds advantages; firstly, the cyber-attacks will be detected accurately and soon they appear; secondly, the system is lightweight and does not use many resources.

2 Methodology

2.1 Datasets

We used two datasets of NSL-KDD and ToN-IoT. The NSL-KDD is a well-known bench making dataset, which was originally developed for conventional network and used by many researchers [12]. Hence, we have included this for comparison purpose. The dataset has 41 features and total of 148,517 samples in both train and test samples. The dataset contains Normal, Denial of Service (DoS), Probe, User to Root (u2r), and Root to Local (r2l)) samples. The detail of each class count is shown in Fig. 1. The second dataset is a new cyber-attack dataset which was developed for IoT and IIoT systems. The dataset was built in a real-world IoT network context, using seven different sensors and telemetry services. As a result, the dataset exemplifies the IoT system's diversity. The dataset is the IoT system's network traffic, converted into NetFlow files [13]. NetFlow format is lighter than payload data as it only uses metadata instead of the packet contents. Additionally, since it does not use the packets holding the patient’s data, it does not violate the privacy rules, making the approach more compatible with an IoMT system. This version of data was curated and most informative records and features are selected to achieve a high performance [13]. The number of data records is 1,379,274, while the feature count is 13. There are multiple attacks available in the dataset as their sample counts are shown in Fig. 1.

Fig. 1.
figure 1

The sample count of the attacks and normal class in the (a) NSL-KDD and (b) ToN-IoT_NetFlow datasets.

2.2 A Lightweight Network Attack Detection System

The proposed system in this study follows the guidelines of fog-computing architecture by IEEE [8]. Figure 2 illustrates the proposed system.

The absent security measures are shown by the red alert symbol. As a result of security absence, the medical devices and their network communication at edge-fog layers are exposed to various attacks. The amount of network data arriving through fog devices will grow with time, resulting in massive data, but fog devices are inefficient at storing it. As a result, training the data in stages would be preferable to retrain the whole data every time they aggregate. We have used a sliding window setting to train the classifiers incrementally. Unlike batch learning which uses cross-validations and hold-out, in our online learning a prequential evaluation was utilized, which uses the samples incrementally to train and test each sample record at a time. In this experiment, the maximum memory was set to 5 thousand samples, while the sliding window was set to 1000 samples at a time.

Fig. 2.
figure 2

The proposed fog-based network attack detection system and its architecture.

Compared Incremental Classifiers.

In this study, a collection of single incremental classifiers and Bagging Hoeffding Tree ensemble was utilized to be compared to the proposed WHTE ensemble model. Each of them was deployed with their best tuned parameters using the same experimental environment. The following list is the utilized single classifiers:

  • Incremental K-Nearest Neighbor (IKNN) [6].

  • Incremental Naïve Bayes (INB) [14].

  • Hoeffding Tree-based Majority Class (HTMC) [15].

  • Hoeffding Tree Naïve Bayes (HTNB) [15].

  • Hoeffding Tree Naïve Bayes Adaptive (HTNBA) [16].

  • Bagging Hoeffding Tree [17]

Weighted Majority Hoeffding Tree Ensemble (WHTE).

The previously mentioned single classifiers may not produce high performance when the data is heterogeneous such as the IoT data. Hence, the ensemble of the single classifiers could maximize their performance and minimize their weakness. As a result, we propose an ensemble strategy in which a collection of single classifiers, particularly distinct types of Hoeffding Tree classifiers, are combined (HTMC, HTNB, HTNBA). Figure 3 depicts a summary of our ensemble technique flowchart. The ensemble is called Weighted Hoeffding Tree Ensemble (WHTE), which uses a weighted majority approach. It considers all of the classifiers' decisions equally at the beginning [18]. It will, however, penalize a classifier if they make a wrong decision by not treating their decisions as significant as they formerly were [19]. The overall performance of the ensemble is the maximum because the errors created by the entire algorithm will essentially be the same as a constant error made by the best approach. When the expert makes a mistake in the initial weighted majority algorithm, the weighted value is doubled by ½. As a result, the error bound equation is as follows:

$$M\le 2.41(m+\mathrm{log}N)$$
(1)

where, \(m\) is the total of mistakes of the best classifier, \(M\) is the total of mistakes of the ensemble, and \(N\) is the total number of single classifiers. A randomized form of weighted majority algorithm can be used to reduce the error value to a minimum, which reduces the error equation's constant value to close to one by adding (Beta β) to the equation. Hence, for the WHTE ensemble, the error equation can be defined as follow:

$$ M \le \frac{{m\,\,In\,\left( {1 / \beta } \right) + In\,\,N}}{1 - \beta } $$
(2)

The value of \(\beta \) is set to be 0.5. Hence, the value of \(M\) for each iteration or a sample at a time will be counted as follow:

$$ M \le 1.39m\, + \,2In\,\,N $$
(3)

Performance Metrics.

Multiple metrics are used to evaluate the proposed method in the current study. The detail of each metric is given in Table 1.

Table 1. The utilized evaluation metrics for evaluating the proposed method.
Fig. 3.
figure 3

The flowchart of the proposed ensemble WHTE method.

3 Results and Discussion

First, the proposed methods were evaluated on NSL-KDD dataset for the purpose of comparison with literature. As shown in Table 2, the WHTE ensemble outperformed the other single and ensemble classifiers with a high accuracy of 98.0%. Also, it is an ensemble method which recorded lower memory usage and CPU time compared to the Bagging ensemble.

Table 2. An average performance of the WHTE classifier compared to the other single and ensemble classifiers for the NSL-KDD dataset.

After that the proposed model was evaluated on the ToN-IoT_NetFlow dataset using binary and multi-class classification. In the binary classification, the results were much better than multi-class classification, as expected. This is because multi-classification of 10 (refer to Fig. 1) classes in incremental fashion reduces the accuracy. Table 3 shows that the model's average accuracy for the ensemble WHTE was 100%. In addition, Bagging had 99.40% average accuracy and took the second place. The average memory use for the WHTE technique was 0.37 MiB and Bagging recorded the highest of 8.63 MiB while the HTNBA, HTNB, and HTMC methods used 0.08 MiB each. The average CPU time required to identify all intrusions was just 12.89 s for the WHTE technique, while Bagging needed 77.49 s. The IKNN approach, on the other hand, has highest complexity. In the multiclass classification, WHTE again took tradeoff between accuracy and complexity. Table 3 shows that the proposed ensemble had higher accuracy than single classifiers. Although its accuracy was slightly better than Bagging, its time and memory complexity were much lower. This is what we need for the lightweight devices.

Table 3. An average performance of the WHTE classifier compared to the other single and ensemble classifiers for ToNIoT-Netflow dataset using binary and multiclass classification

For the rest of the analysis, we have chosen the results of binary classification due to avoiding multiple and duplicate figures. To see the effect of concept drift on each classifier, we ha each technique's incremental accuracy per five thousand records is conceptualized. From Fig. 4, it can be observed that the INB classifier was sensitive to the concept drift, and it had instability in its accuracy. Comparably, the rest of the classifiers looked much more stable due to the figure's high variance in INB accuracy.

Fig. 4.
figure 4

The incremental accuracy per 5K sliding window samples. The accuracy was averaged for every 100K samples for clear visualization.

Hence, to see the other classifiers’ performance, INB is removed from the illustration presented in Fig. 5. It was seen that the HTNB and IKNN were more sensitive to the changes in the data, and their accuracy was constantly changing. Notably, the WHTE classifier showed a stable accuracy of 100% for each frequent sample. Moreover, Bagging, HTMC and HTNBA performed better instability than the rest of the classifiers.

Fig. 5.
figure 5

Except for INB, the incremental accuracy of the utilized classifiers per 5K sliding window samples. The accuracy was averaged for every 100K samples for clear visualization.

In terms of total CPU time per each sample frequencies, a 3D waterfall color surface was drawn, as demonstrated in Fig. 6. It is obvious that IKNN’s and Bagging’s CPU time were significantly impacted by the rising arrived samples to the system, in which the surface color rises from red to dark blue. Though, for other classifiers the CPU time was risen linearly with the increased samples.

Fig. 6.
figure 6

The incremental CPU time per each subset of data samples for each classifier represented by a 3D waterfall colormap surface

A comparison has been made between the current work and related studies, as shown in Table 4. The proposed system outperformed the previous studies. Additionally, the current system is lightweight, and the system's complexity is comprehensively analyzed, while previous studies were not lightweight nor considered these metrics for their evaluation.

Table 4. A comparison between the proposed system and related fog-based attack detection systems.

4 Conclusions

In this study, a lightweight network attack detection was proposed for the fog devices of the IoMT system. For this purpose, we have proposed a fog-based architecture and an incremental ensemble called WHTE which its performance was compared to another six incremental learning methods. It was seen that the system detects attacks with high accuracy of 100.0. In addition to that, the model is considered lightweight as it uses less low memory and CPU time. As a result, the proposed approach surpassed the earlier conducted solutions.