1 Introduction.

IoT environments are rapidly spread due to the growth of connected objects and heterogeneous physical devices equipped with various sensors, actuators, and processors. They can exchange information directly or via the Internet without human intervention [1, 5]. An aggregator is an important IoT element and is considered a middleware that connects and manages all heterogeneous devices in IoT environments [2]. The cloud is an essential component of IoT that represents the most common compute and storage resources of data gathered within a huge amount of devices [3, 5]. Hence, the expansion of IoT can be seen because of its availability and the increasing deployment in various areas such as healthcare systems, smart cities, smart homes, intelligent transportation, and industries [2, 17].

Several works proposed different IoT architectures. The most frequently used is three layers architecture [1, 3, 5], which is still not adequate for the current development of IoT. Five-layer architecture [3, 5] consists of the perception layer, composed of devices, sensors, and actuators, and is used to collect data from sensors and actuators of the IoT environment. The transport layer manages communication between devices and transfers collected data to the processing layer, which is responsible for storing, analyzing, and processing huge amounts of data. Also, it provides service to the lower layers. The application layer delivers brilliant service to users. The business layer manages the whole IoT system. The cloud- and fog-based IoT is a contemporary architecture that combines edge computing, fog computing, and cloud computing [3, 5]. As depicted in Fig. 1, the cloud- and fog-based IoT consists of a monitoring level, which monitors power, resources, responses, and services, a processing level that filters and analyzes sensor data, and a storage level, which delivers storage functionalities such as data replication, data distribution, and data storage.

Fig. 1
figure 1

Cloud- and fog-based IoT architecture

IoT security is characterized by verification, authorization, privacy, access control, information storage, system configuration, and management [2]. The existing methodologies and standards used for IoT security have many issues due to the complexity of the systems and the heterogeneity of devices used. IDS may, nevertheless, be a crucial and highly beneficial security solution for ensuring the IoT network's security [1,2,3, 13, 14, 48, 55]; it can be deployed in IoT with other security measures, such as encryption techniques, access control, securing routing, and trust manager authentication [8, 12]. In addition, IDS can be categorized into two types: host-based IDS (HIDS) and network-based IDS (NIDS) [6, 7, 10, 15, 32, 53, 54].Our study focuses on NIDS, network traffic attacks, and sending alerts to the network administrator. It is placed outside the network infrastructure and performs the analysis on a copy of the inline network traffic. As a result, actual inline network performance is not affected. It initially checks the packets; it receives from host- or network-based sensors and then utilizes feature extraction to attempt to extract features. The last step is to perform classification algorithms to identify the intrusion or anomaly using retrieved features. Moreover, it is essential to boost IDS with emerging improved artificial intelligence, such as ML and DL [4, 6, 7, 10, 56]. Hence, Intrusion detection is still an ongoing research area because it is a robust approach that allows for secure and protected IoT environments against many attacks such as service scanning, keylogging denial of service (DoS), and distributed DoS (DDoS) [6, 10, 48,49,50]. A set of ensemble learning, ML, and DL methods have been incorporated to propose enhanced IDS. Even with those efforts, many problems remain to be solved, such as real-time detection, class imbalance, quality improvement, high dimensionality, huge volume, and time performance [5].

The main goal of this work is to solve some intrusion detection limits by improving and enhancing the classification performance. Therefore, we validate an anomaly IDS model using Catboost [42, 43], an efficient open-source package combining GB and DT algorithms. Our contribution is summarized in two essential parts. Our contribution is summarized in two points. The first is to increase the accuracy and precision of IDS, and the second is to reduce the detection time. For that, we used the Catboost algorithm, especially gradient boosting for decision trees and benefitting from a library with multi-GPU implementation support to deal with the huge volume to reduce processing time and detection time. Furthermore, CatBoost allows to deal with the categorical features using CatboostEncoder and solve class imbalance by optimizing the detection of minority classes using target statistics and gradient boosting. We tested the model and provided a comparative study on four datasets NSL-KDD [45], BoT-IoT [46], IoT-23 [40], and Edge-IIoTset [23], to confirm the stability and to determine the effectiveness of our solution. The experimental results prove that our model performs well and makes reliable decisions.

The remainder of this paper is structured as follows. Section 2 reviews some related works in intrusion detection approaches that include ML, DL, and ensemble learning techniques. Section 3 describes the essential steps of the proposed design and suggested solutions to validate our intrusion detection approach. The experimental evaluation and results are discussed in Sect. 4. Finally, the paper is achieved with a conclusion and future works.

2 Related works

This section reviews and cites some recent related works of IDS that integrate ML and DL algorithms for enhancing IoT security.

IoT security is a crucial issue because of the heterogeneity of IoT systems and the insufficiency of security measures embedded in devices [2, 5, 48]. IoT security issues are based on traditional and existing security mechanisms such as authentication, securing routing, encryption, key management protocols, authorization frameworks, IDS, and other approaches [1, 2]. However, they are insufficient to better secure IoT [1, 10]. In addition, the lack of measures considers the limited resource of energy and memory [3]. IoT architectures are distributed. Hence, sensors and devices need to communicate and aggregate data before getting to the Internet and then connect to the Internet via a smart gateway using user datagram protocol (UDP), transmission control protocol, address resolution protocol, IPv6 Internet Control Message Protocol (ICMP), Internet group management protocol, or Reverse Address Resolution Protocol [3, 16, 17]. On the other hand, intrusion detection is a defense mechanism used to monitor traffic and detect vulnerabilities within the network infrastructure. It can identify and stop malicious activities [6, 7, 51, 52].

In the literature review, as depicted in Table 1, many researchers have investigated their efforts to enhance intrusion detection to protect the IoT environment. Accordingly, Misra et al. [24], and Kasinathan et al. [25] presented novel security architecture for detecting DDoS attacks in IoT. In 2013, Raza et al. [28] created IDS called SVELTE to secure IoT with an integrated mini firewall that uses RPL as a routing protocol in IPv6 over Low-power Wireless Personal Area Network (6LoWPAN) networks. However, in 2015, C. Cervantes et al. [29] benchmarked SVELTE and presented the Intrusion detection of SiNkhole attacks on 6LoWPAN for InterneT of ThIngs (INTI) system for detecting sinkhole attacks on 6LoWPAN for IoT. The simulation result showed that INTI has a low rate of false positives and negatives than SVELTE. In 2016, Sonar et al. [26] proposed an intrusion detection approach to secure IoT against DDoS. They explored the effectiveness of deploying ML and DL algorithms in IDS to improve the security of IoT systems, such as Hodo et al. [11, 13] proposed an ANN IDS model to classify threat analysis of IoT networks. The evaluation of this model achieves over 99% accuracy. In 2017, Fadlullah et al. [9] proposed a background of DL evolving machine intelligence toward intelligent network traffic. A set of ML and IDS contributions for IoT security are analyzed, combining IoT, IDS, and ML. Simultaneously, Diro et al. [20] developed a distributed attack DL detection scheme for IoT security. The model can better detect attacks than centralized ones; the accuracy increased from 96 to over 99%. In 2018, Prabavathy et al. [21] proposed an IDS design of cognitive fog computing for IoT environments. The proposed design is implemented using the OS-ELM algorithm at distributed fog nodes and achieves 97.36% accuracy with a reduced false alarm rate of 0.37%. One year later, Verma et al. [19] compared and brought the performances of many supervised ML algorithms to select a reliable classifier model for IoT security. They proposed an IDSs model based on ensemble learning and proved that Gradient Boosting Machine (GBM) performs best in sensitivity at 99.53%. Furthermore, Chaabouni et al. [18] proposed an OneM2M IDS based on edge ML for IoT security. The experimental results demonstrate good results in detection rate 93.80%, accuracy 92.32%, precision 92.95%, FPR 1.53%, and CPU training time 9280 ms. Al-kasassbeh et al. [31] The LightGBM algorithm achieved almost 100% accuracy, proving this ML algorithm's efficiency over DL strategies. In 2021, Ullah et al. [12] laid out a deep learning model IDS using a convolutional neural network for binary and multi-cast classifications, the model gives the minimum detection rate of around 99.7%. Therefore, from the above-related works, it is clear that robust intrusion detection approaches are achieved using gradient GBM, extreme gradient boosting (XGB), and LightGBM.

Table 1 Classification and comparison study of IDSs for IoT security

All methods based on gradient boosting are extremely powerful optimization algorithms. Moreover, according to the comparison by Abdullahi et al. [43], Catboost is the most efficient; it outperforms all existing implementations of GBDT, such as GBM, XGB, LightGBM, and H2O. Catboost allows combining all positive points. Hence, implementing GB using binary DT as basic predictors [42, 44] that use the same splitting criterion on a whole level of the tree makes it less prone to overfitting and faster execution at test time [36].

Catboost offers a very efficient way to encode categorical features and has a library with multi-GPU implementation support [42]. In order to evaluate IDSs performance, many datasets are available, for instance, KDD99, UNSW-NB15, Kyoto 2006 + , NSL-KDD, BoT-IoT, IoT-23, IoT Network Intrusion, MQTT-IoT-IDS2020, and CICIDS2017 [7, 40, 45, 46]. This evaluation's most commonly used metrics are ACC, recall, FPR, FNR, precision, and f1-score.

3 Optimized intrusion detection model

This section details various solutions to validate our intrusion detection approach for IoT environment security.

3.1 Proposed design

Our contribution aims to propose and implement an optimized model improving detection rate, accuracy, and processing time. The architecture of the proposed model is illustrated in Fig. 2.

Fig. 2
figure 2

Proposed design of our IDS approach for IoT security

This model aims to validate optimized IDS based on the Catboost classifier combining GB and DT algorithms. Therefore, our proposed approach can reduce the gradient estimation bias and improve the generalization capability. The training stage is carried out using GPU. As depicted in Fig. 2, our optimized model process is divided into four essential steps:

  • Step 1 Data pre-processing:

    Data is prepared and understood. Therefore, we identified and removed all inconsistent values, such as real and NaN values.

  • Step 2 Feature engineering:

    The feature vector \((X_{m}^{1} , X_{m}^{2} ,...., X_{m}^{n} )\) and target label \(Y_{m} (y_{1} ,y_{2} ...,y_{m} )\) are defined and prepared with a Catboost encoder using the average label values on the whole train dataset to reduce the overfitting problem [44]. Then the categorical values \({X}_{m}^{\mathrm{i}}\) are encoded with (CatBoost Encoder) by greedily using the TS on the whole dataset to reduce overfitting, avoid target leakage, and normalization problems. Subsequently, the features are transformed and combined. The ordering approach creates a strong predictor in each category based.

  • Step 3 Training and building of the model

    The test and train data are reconstructed as shown in Table 2, and the hyperparameters are identified, such as max depth, iterations, task type, estimation method, loss function, boosting type, and eval metric. All hyperparameters are optimized to obtain the best performance, as shown in Table 3. The training process is implemented using ML ensemble classifier Catboost and GPU processing.

  • Step 4 Intrusion detection:

Table 2 The confusion matrix
Table 3 Data reconstructions

The building model can predict an attack as positive. It is evaluated and validated based on metric performances from the confusion matrix (Table 4), such as ACC, recall, precision, FPR, FNR, and f1-score.

Table 4 Optimized Catboost hyperparameters

3.2 Catboost implementing

Assume that we observe the data D with m samples and n features:

$$D = \{ (\begin{array}{*{20}c} {X_{j}^{i} ,y_{j} )\} \;when\; \left\{ {\begin{array}{*{20}c} {i = 1,...,n} \\ {j = 1,...,m} \\ \end{array} } \right.} \\ \end{array}$$
(1)

The dimensional feature vector \(X_{{\dot{J}}} \in {\mathbb{R}}^{m}\) and the corresponding label \(y_{{\dot{J}}} \in {\mathbb{R}}\). The symmetric DT is defined in Eq. 2 [44]:

$$h\left( a \right) = \mathop \sum \limits_{1}^{k} w_{k} 1_{{\left\{ {a \in } \right.R_{k} \} }}$$
(2)

\({\text{h}}\left( {\text{a}} \right)\) is constructed by superposition of estimated response features of all regions: R: 1… k. with \({\text{w}}_{{\text{k}}} { }\) is the estimated value of the predicted class label of each region k and \(1_{{\left\{ {{\text{a}} \in } \right.R_{{\text{k}}} \} }}\) is the indicator function defined in Eq. 3

$$1_{{\left\{ {a \in } \right.{\mathbb{R}}_{k} \} }} = \left\{ {\begin{array}{*{20}c} {1 \;if\; a \in R_{k} } \\ {0\; if \;a \notin R_{k} } \\ \end{array} } \right.$$
(3)

The training on GB aims to minimize expected loss \({\mathcal{L}}\left( F \right): = \user2{\rm E}L\left( {y,F\left( x \right)} \right)\) with a smooth loss function \(L\left( {.,.} \right)\) and F is the approximate function. So \({ }F^{t}\) is the series of approximate functions defined in Eq. 4 [44].

$$F^{t} :{\mathbb{R}}^{m} \to {\mathbb{R}}, F^{t} = F^{t - 1} + \alpha h^{t}$$
(4)

\(\alpha\) is a step size and \(h^{t}\) is a base predictor from a family of functions H of Eq. 5 [44].

$$h^{t} = \arg \min_{h \in H } {\mathcal{L}}\left( {F^{t - 1} + h^{t} } \right) = \arg \min_{h \in H } {\rm E}{\mathcal{L}}\left( {y,F^{t - 1} \left( x \right) + h^{t} \left( x \right) } \right)$$
(5)

This minimization problem is solved by the negative gradient \(- g^{t} \left( {x,y} \right){ }\) with Eq. 6 [44].

$$g^{t} \left( {x,y} \right) = \frac{{\partial L\left( {y,s} \right)}}{\partial s}|_{{s = F^{t - 1} \left( x \right)}}$$
(6)

\({ }h^{{t{ }}}\) is chosen so that \(h^{t} \left( x \right)\) approximates \(- g^{t} \left( {x,y} \right)\) the DT function that minimizes expected loss so \(h^{{t{ }}}\) became from Eq. 7 [44].

$$h^{t} = \arg \min_{h \in H } {\rm E}\left( { - g^{t} \left( {x,y} \right) - h\left( x \right)} \right)$$
(7)

This expectation is approximated by considering dataset D. Moreover, Catboost solves prediction shift by using ordered boosting and categorical features problems with the greedy target statistics (TS). It is an estimate of the expected target y in each category \({ }x_{j}^{i}\) with jth training defined in Eq. 8.

$$\hat{x}_{j}^{i} = {\rm E}\left( {y | x^{i} = x_{j}^{i} } \right) = \frac{{\mathop \sum \nolimits_{k = 1}^{n} 1_{{\left\{ {x_{k}^{i} = x_{j}^{i} } \right\}}} .y_{j} + \alpha p}}{{\mathop \sum \nolimits_{k = 1}^{n} 1_{{\left\{ {x_{k}^{i} = x_{j}^{i} } \right\}}} + \alpha }}$$
(8)

When p is set to the average of the target value over the sample with the parameter \(\alpha\) > 0. According to standard parts of IDS, our proposed approach is designed in four steps: data preprocessing, feature engineering, Training and building of the model, and intrusion detection. It integrates a GB classifier which gives efficient decisions. Indeed, if it is about a binary classification, the most efficient ML is the DT [34, 35]. In practice, datasets include both numerical and categorical features. The problem of categorical features is well solved with the digital conversion that proposes Catboost.

We verified our model using recall, accuracy, and precision. The proportion of correctly recognized samples to the total number of samples is how accuracy is measured. The proportion of correctly categorized items to the total TP (True Positive) and FP is used to gauge precision (False Positive). Calculating the recall value involves dividing the total number of TP measurements by the total number of TP and FN (False Negative). We also calculate FPR and FNR. The FPR (False Positive Rate) is the percentage of normal samples that test positive, while the FNR (False Negative Rate) is the percentage of abnormal samples that test negative.

  • True Positive: the model predicts attack as true and is actually true.

  • True Negative: the model predicts normal as not true and is actually normal.

  • False Positive: the model predicts attack but is actually not.

  • False Negative: the model predicts normal but is actually not.

    $$Accuracy = \frac{TP + TN}{{TP + TP + FP + FN}}$$
    (9)
    $$Precision = \frac{TP}{{TP + FP}}$$
    (10)
    $$Recall = \frac{TP}{{TP + FN}}$$
    (11)
    $$FNR = \frac{FN}{{FN + TP}}$$
    (12)
    $$FPR = \frac{FP}{{FP + TN}}$$
    (13)

4 Experimental evaluation and results

4.1 Datasets and simulation setup

The evaluation of IDS is an essential issue. Moreover, the optimal parameters of performance of any classifier depend on the dataset used in the training and the test of the model. In this paper, four datasets are used:

· Edge-IIoTset [23] The dataset was generated using a specially designed IoT/IIoT testbed with a prominent representative set of protocols, sensors, and cloud/edge configurations. Data is generated from several sensors, such as humidity, temperature, water level, heart rate, pH, etc.

· BoT-IoT [46] is evolved and labeled for possible multiclass purposes. The label features indicated an attack flow, the attacks category, and the subcategory. BoT-IoT has a more significant number of attacks, 99.99%, than benign ones, 0.01%, and it has a total of 46 features, including the target variable.

· IoT-23 [40] contains captured real traffic by the avast AIC laboratory in partnership with the Czech technical university in Prague. IoT-23 contains twenty malware captures from different IoT devices and three captures for benign anomalies.

· NSL-KDD [45] is an improved version of KDD99 and has evolved by eliminating redundant then duplicate records. In the present study, we have used 20% of NSL-KDD taking into account all features except the target vector.

The experimental evaluation of our approach is performed on multi-core Intel® Core™ i7-1165G7 @ 2.80 GHz. 2.80 GHz and GPU Nvidia® PhysX® GeForce MX330 with 8 GB RAM and 64-bit operating system. The model is implemented using Jupyter Lab under python 3.9.7 and Catboost 1.0.3, including pandas, NumPy, sklearn libraries, and driver GPU.

  • In the learning phase, we form the model following the steps described in Sect. 3.1; we partition our data into two portions of 50%-50%. In training, we used 200 iterations, trained the model using 50% of the dataset, and created two random permutations in Catboost. Furthermore, we used the gradient to calculate the values in leaves with three depth maximum. The founders of CatBoost are already testing this practice. As mentioned, K-fold when K = 2 is the best for most datasets since it does not suffer from conditional shift [44]. On the other hand, it will allow us to keep more data for prediction tests.

  • In the Validations phase, following these steps in Fig. 3, we used all data (100%) to evaluate the model. Firstly, to select the most influential features, we used CatboostEncoder to deal with the categorical features; then, we used the model to predict the Attacks in all data.

Fig. 3
figure 3

Validation process

4.2 Experimental results and discussion

  • Binary classifications:

We use datasets according to our model, implementing the process steps defined in Fig. 2. Firstly, we pre-process the datasets and then define, extract, and encode features vector and target labels; we use all features in the first training. Subsequently, we must define train_size, test_size, and hyperparameter in Table 3 to train and test our model using the open-source plate-forme Catboost. In our experimentation, we used 160 iterations and we created two random permutations of our training data. We used a gradient to calculate the values in leaves with three depth maximum. This operation's resulting complexity is O(2n). After testing, we obtain the following results (Fig. 4).

Fig. 4
figure 4

Confusion matrix of prediction on BoT-IoT, IoT-23, NSL-KDD datasets

Using the BoT-IoT dataset, we obtain good results in Table 5 and Fig. 5, our model achieving the highest intrusion detection performance in accuracy, precision, and recall, around 100%. The confusion matrices shown in Fig. 4 describe that the model is successfully achieved with 0 FPR and 0 FNR. Figure 6 describes the learning and detection time. It needs seven iterations with a performance time of 4,25 s to fit the model on GPU and 0,865 s to detect attacks in all data. For those successful results, we used 42 features to train and test our model, but we used just 25 features in validation that contribute to the performance, as presented in Fig. 7 and Table 6.

Table 5 Performance measures result on Edge-IIoT, BoT-IoT, IoT-23, NSL-KDD
Fig. 5
figure 5

Performance evaluation of our model

Fig. 6
figure 6

Learning time and detection time of different datasets

Fig. 7
figure 7

Influential features of detection attack on BoT-IoT dataset

Table 6 BoT-IoT Features used in validation

We use IoT-23 to Compare and confirm the performance of the model. The obtained results confirm the performance of the model. In Table 5, Fig. 5, all accuracy, precision, and recall results are around 99.9%. Conversely, the error is minimal and converges to zero with 0.00002 FNR and 0.00018 FPR, as shown by the confusion matrix in Fig. 4. Moreover, Fig. 6 shows that we need just 12 s to fit the model in GPU and 0,763 s to detect attacks in all dataset. In addition, we used 18 features for those successful results, but only 12 influenced detection features. We use 20% NSL-KDD to confirm and compare the performance of our model. As shown in Table 5, Fig. 4, and Fig. 5, our model still performs well. The best iteration is in a total time of 3,85 s to fit and a detection time of just 0,108 s.

Again, as discussed above, the results obtained confirm the model's performance in accuracy, precision, and recall with 99.8% and 0.00068 FPR, and 0.00082 FNR. We tested the model with Edge-IIoT and obtained higher results in Table 5. The obtained results confirm the performance of the model. All accuracy, precision, and recall results are 100%. On the other side, the error is zero with 0 FNR and 0 FPR, as shown by the confusion matrix in Fig. 4. Moreover, Fig. 6 shows that we need just 1 s to fit the model in GPU and 0,146 s to detect attacks in all datasets.

  • Multiclass classification in Edge-IIoT:

The result of categorical classification in Edge-IIoT, as shown in Table 7 and Fig. 8, proved that the model produced a detection rate comparable to that of the binary classification model. The model has a comparatively high level of precision and accuracy throughout training, 100%, and validation 99,27%.

Table 7 Performance measures of categorical classification result on Edge-IIoT
Fig. 8
figure 8

Performance evaluation of the model on Edge-IIoT

FPR and FNR rates in the model are meager. Additionally, as shown in Fig. 9, it has a higher detection rate for the normal classes and malicious classes like DDoS ICMP, DDoS UDP, MITM, Password, SQL injection, Uploading, and Vulnerability Scanner with 100% of precision and recall compared to other malicious classes like DDoS HTTP, Port Scanning, Ransomware, Backdoor, Fingerprinting, and XSS that has recalled around of 98%. Moreover, the model performs well in record time in terms of detection time with 0,44 s in all data. Our research leads us to conclude that the model is still performant and identifies abnormalities in multiclass classification.

Fig. 9
figure 9

Confusion matrix of multiclass prediction on Edge-IIoT

The binary and multiclass classifications were performed using a Catboost, especially gradient boosting for decision trees trained and validated on GPU. The model took less time to train between 1 and 12 s. It took a record to validate between 0,1 and 0,8 s. The training model then validates it using the influence features, further reducing the calculation time and increasing IDS' accuracy and precision. The model benefitting from TS using gradient boosting and CatboostEncoder deal with the huge volume and solve class imbalance by optimizing the detection of minority classes using target statistics and gradient boosting. Furthermore, the use of GPU benefits and influences this model.

We tested the model on different datasets to make a comparison. The model proved to be fast and had a good detection rate. So, integrating GPU at the fog computing level can potentially minimize the intrusion detection time and assure responsiveness. According to the performance comparison presented in Table 8 and Fig. 10, our proposed model achieves the highest performance and outperforms all other IDSs in this study in terms of robustness and time performance.

Table 8 Comparison of some intrusion detection methods on BoT-IoT, IoT-23, NSL-KDD datasets
Fig. 10
figure 10

Comparison of performance and processing time of ML and DL intrusion detection

5 Conclusion and future work

Intrusion detection is ideal for reinforcing IoT security against attacks, especially when integrating IDS in fog computing. This paper presents an optimized intrusion detection model for IoT security based on an anomaly detection method to enhance IDS accuracy with time processing performance. The results of the experiments realized on multiple datasets, and the performance comparisons that have been made have proven that our model is the highest and most robust performance with the lowest cost in time. The model benefits from TS using gradient boosting and CatboostEncoder that deal with the huge volume and solve class imbalance by optimizing the detection of minority classes using target statistics and gradient boosting. The use of GPU benefits and influences the model. According to this study, the suggested model would contribute to developing an efficient IoT network intrusion detection system with a high detection rate. In addition, this work confirms that Catboost is a powerful ML. For future work, we plan to use Blockchain enhancement with machine learning methods to reinforce security in IoT environments.