Keywords

1 Introduction

Embedded devices are interconnected to each other and further connected to the Internet to form an Internet of Things (IoT) [1, 2]. Smart home paradigm exploits the enormous capabilities of IoT technologies to develop intelligent appliances and applications such as smart televisions, smart fridges, smart lighting, and smart security alarm systems [3, 4]. Primarily, IoT-enabled devices in homes autonomously communicate with residents and other IoT-enabled devices over the Internet. Unfortunately, invalidated assumptions and incompatibility in the integration of multiple IoT technologies, standards, proprietary communication protocols, and heterogeneous platforms have exposed smart homes to critical security vulnerabilities [5]. Most of the IoT devices and applications that are in use today were developed with little or no consideration for cybersecurity [6]. Hence, IoT devices tend to be easier to compromise than traditional computers [7].

Cyber attackers exploit the lack of basic security protocols in IoT devices to gain unauthorized remote access and control over insecure network nodes [8, 9]. Compromised IoT devices (i.e., bots) in smart homes can be connected to a master bot in a remote location. This kind of connection helps hackers to form a coordinated network of bots (botnets) [10, 11]. Botnets launch large-scale distributed denial of service (DDoS) attacks with massive traffic volume [10, 12]. Other botnet scenarios include port scanning, operating system (OS) fingerprinting, information theft, and keylogging [13]. Existing security solutions that are primarily designed for traditional computer networks may not be efficient for IoT botnet detection in smart home scenarios due to the unique characteristics of IoT devices and their systems [13].

Traffic patterns of different botnet attack scenarios can be detected in IoT network traffic data using machine learning (ML) approach. Various shallow learning techniques have been proposed to detect botnet attacks in IoT networks. These include support vector machine (SVM) [13,14,15,16,17,18,19,20], decision trees (DT) [14, 15, 19, 21,22,23,24,25], random forest (RF) [8, 15, 18, 21, 23, 26], bagging [15], k-nearest neighbor (k-NN) [15, 19, 21, 23, 24, 26], artificial neural network (ANN) [22, 25, 27], Naïve Bayes (NB) [22, 23, 25], isolation forest [16], feedforward neural network (FFNN) [18], k-means clustering [28, 29], and association rule mining (ARM) [25]. However, data generation in IoT networks is expected to be big in terms of volume, variety, and velocity [30]. Therefore, machine learning techniques with shallow network architecture may not be suitable for botnet detection in big IoT data applications.

Deep learning (DL) offers two main advantages over shallow machine learning, namely, hierarchical feature representation and automatic feature engineering, and improvement in classification performance owing to deeper network architecture. DL techniques have demonstrated good capability for botnet detection in big IoT data applications. State-of-the-art DL techniques for IoT botnet detection include deep neural network (DNN) [14], convolutional neural network (CNN) [8, 31,32,33,34], recurrent neural network (RNN) [13, 29], long short-term memory (LSTM) [13, 29, 35, 36], and bidirectional LSTM (BLSTM) [36, 37]. The classification performance of DL methods depends on the choice of optimal model hyperparameters. However, model hyperparameters are often selected by trial and error methods in previous studies.

Gated recurrent unit (GRU) is a variant of RNN, and it is suitable for large-scale sequential data processing [38]. Bidirectional GRU (BGRU) has a unique advantage of accessing both past and present information to make an accurate decision for efficient classification performance with lower computational demands [39]. To the best of our knowledge, previous researches have not investigated the capability of BGRU for botnet detection in a smart home scenario. In this paper, we aim to find the optimal hyperparameters for efficient deep BGRU-based botnet detection in IoT-enabled smart homes. The main contributions of this paper are summarized as follows:

  1. a.

    A methodology is proposed to determine the most suitable hyperparameters (activation function, epoch, hidden layer, hidden unit, batch size, and optimizer) for optimal BGRU-based multi-class classification.

  2. b.

    A deep BGRU model is designed based on the selected model hyperparameters to distinguish normal network traffic from IoT botnet attack scenarios.

  3. c.

    The proposed methodology is implemented, and the deep BGRU model is developed with the bot-IoT dataset.

  4. d.

    We evaluate the performance of the deep BGRU model based on training loss, validation loss, true positive rate (TPR), false positive rate (FPR), Matthews correlation coefficient (MCC), and training time.

The remaining part of this paper is organized as follows: Sect. 2 describes the proposed methodology for selection of optimal BGRU hyperparameters; the method is employed to develop a deep BGRU model for efficient IoT botnet detection; the results of extensive model simulations are presented in Sect. 3, and Sect. 4 concludes the paper.

2 Deep BGRU Method for Botnet Detection in IoT Networks

In this section, we describe the concept of BGRU, the proposed methodology for optimal BGRU hyperparameters, and the development of efficient deep BGRU classifiers for botnet detection in the context of a smart home. The overview of the framework is shown in Fig. 1.

Fig. 1
figure 1

Optimal hyperparameters of BGRU

2.1 Bidirectional Gated Recurrent Unit

GRU is a variant of RNN. This hidden unit achieves performance similar to LSTM with a simplified gated mechanism and lower computation requirements [40]. Unlike LSTM, GRU discards the memory unit and replaces the input and forget gates with an update gate. Figure 2 shows the standard structure of a GRU. A GRU has two gates, namely, the reset gate (ri) and the update gate (zi). These gates depend on past hidden state (ht−1) and the present input (xt). The reset gate determines whether the past hidden state should be ignored or not, while the update gate changes the past hidden state to a new hidden state (\( \tilde{h}_{i} \)). The past hidden state is ignored when \( r_{i} \simeq 0 \) such that information that is not relevant to the future is dropped and a more compact representation is obtained. The update gate regulates the amount of data that is transmitted from the past hidden state to the present hidden state.

Fig. 2
figure 2

The architecture of GRU

GRU is a unidirectional RNN, i.e., it employs a single hidden layer, and its recurrent connections are in the backward time direction only. GRU cannot update the present hidden state based on the information in the future hidden state. Interestingly, BGRU updates its current hidden state based on both the past and the future hidden state information [41]. A single BGRU has two hidden layers, which are both connected to the input and output. The first hidden layer establishes recurrent connections between the past hidden states and the present hidden state in the backward time direction. On the other hand, the second hidden layer establishes recurrent connections between the present hidden state and the future hidden states in the forward time direction. The computation of BGRU parameters is obtained by (1)–(9):

$$ \overleftarrow {{r_{i} }} = \sigma \left( {\left[ {\varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{W} }_{r} \varvec{x}} \right]_{i} + \left[ {\varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{U} }_{r} \varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h} }_{{\left( {t - 1} \right)}} } \right]_{i} + \varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{b} }_{r} } \right), $$
(1)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{z}_{i} = \sigma \left( {\left[ {\varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{W} }_{z} \varvec{x}} \right]_{i} + \left[ {\varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{U} }_{z} \varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h} }_{{\left( {t - 1} \right)}} } \right]_{i} + \varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{b} }_{z} } \right), $$
(2)
$$ \overleftarrow {{\tilde{h}_{i}^{\left( t \right)} }} = \phi \left( {\left[ {\varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{W} x}} \right]_{i} + \left[ {\varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{U} }\left( {\varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{r} } \odot \varvec{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h} }_{{\left( {t - 1} \right)}} } \right)} \right]_{i} } \right), $$
(3)
$$ \overleftarrow {{h_{i}^{\left( t \right)} }} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{z} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{i}^{{\left( {t - 1} \right)}} + \left( {1 - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{z}_{i} } \right)\overleftarrow {{\tilde{h}_{i}^{\left( t \right)} }} , $$
(4)
$$ \overrightarrow {{r_{i} }} = \sigma \left( {\left[ {\vec{\varvec{W}}_{r} \varvec{x}} \right]_{i} + \left[ {\vec{\varvec{U}}_{r} \vec{\varvec{h}}_{{\left( {t + 1} \right)}} } \right]_{i} + \vec{\varvec{b}}_{r} } \right), $$
(5)
$$ \overrightarrow {{z_{i} }} = \sigma \left( {\left[ {\vec{\varvec{W}}_{z} \varvec{x}} \right]_{i} + \left[ {\vec{\varvec{U}}_{z} \vec{\varvec{h}}_{{\left( {t + 1} \right)}} } \right]_{i} + \vec{\varvec{b}}_{z} } \right), $$
(6)
$$ \overrightarrow {{\tilde{h}_{i}^{\left( t \right)} }} = \phi \left( {\left[ {\vec{\varvec{W}}\varvec{x}} \right]_{i} + \left[ {\vec{\varvec{U}}\left( {\vec{\varvec{r}} \odot \vec{\varvec{h}}_{{\left( {t + 1} \right)}} } \right)} \right]_{i} } \right), $$
(7)
$$ \overrightarrow {{h_{i}^{\left( t \right)} }} = \overrightarrow {{z_{i} }} \vec{h}_{i}^{{\left( {t - 1} \right)}} + \left( {1 - \overrightarrow {{z_{i} }} } \right)\overrightarrow {{\tilde{h}_{i}^{\left( t \right)} }} , $$
(8)
$$ \tilde{y}_{i} = \vartheta \left( {\varvec{W}_{y} \overleftarrow {{\varvec{h}_{t}^{\left( t \right)} }} + \varvec{U}_{y} \overrightarrow {{\varvec{h}_{t}^{\left( t \right)} }} + \varvec{b}_{y} } \right), $$
(9)

where \( x, r, z, h, \tilde{y} \) and i are the input, reset gate, update gate, hidden state, output, and hidden unit index, respectively; \( \overleftarrow {\left( \cdot \right)} \) and \( \overrightarrow {\left( \cdot \right)} \) represent the parameters of the hidden layers in the backward and forward time directions, respectively; W(·) and U(·) are the weight matrices while b(·) is the bias vector; σ(·) is a logistic sigmoid activation function; ϕ(·) is either hyperbolic tangent (tanh) or rectified linear unit (ReLU) activation function; and \( \vartheta \left( \cdot \right) \) is a softmax activation function.

2.2 The Proposed Method for Selection of Optimal BGRU Hyperparameters

The proposed method for optimal selection of BGRU hyperparameters is presented in Algorithm 1. Network traffic features are considered as sequential data given by (10):

$$ \varvec{X} = \left[ {\begin{array}{*{20}c} {x_{1,1} } \\ \vdots \\ {x_{\delta ,1} } \\ \end{array} \begin{array}{*{20}c} {x_{1,2} } & \cdots & {x_{1,\mu } } \\ \vdots & \ddots & \vdots \\ {x_{\delta ,2} } & \cdots & {x_{\delta ,\mu } } \\ \end{array} } \right] $$
(10)

where μ is the number of network traffic features, and δ is the number of network traffic samples. The network traffic features in the training, validation, and testing sets are represented by Xtr, Xva, and Xte, respectively. The ground truth labels for training, validation, and testing are represented by ytr, yva, and yte, respectively.

The selection of optimal BGRU hyperparameters will lead to efficient detection and classification of IoT botnet attacks. These hyperparameters include activation function (af), epoch (ep), hidden layer (hl), hidden unit (hu), batch size (bs), and optimizer (op). The optimal choice is made from a set of commonly used hyperparameters through extensive simulations. The collection of the hyperparameters is given by (11)–(16):

$$ \varvec{a}_{\varvec{f}} = \left[ {a_{f,1} , a_{f,2} , \ldots , a_{f,n} } \right], $$
(11)
$$ \varvec{e}_{\varvec{p}} = \left[ {e_{p,1} , e_{p,2} , \ldots , e_{p,m} } \right], $$
(12)
$$ \varvec{h}_{\varvec{l}} = \left[ {h_{l,1} , h_{l,2} , \ldots , h_{l,k} } \right], $$
(13)
$$ \varvec{h}_{\varvec{u}} = \left[ {h_{u,1} , h_{u,2} , \ldots , h_{u,q} } \right], $$
(14)
$$ \varvec{b}_{\varvec{s}} = \left[ {b_{s,1} , b_{s,2} , \ldots , b_{s,v} } \right], $$
(15)
$$ \varvec{o}_{\varvec{p}} = \left[ {o_{p,1} , o_{p,1} , \ldots , o_{p,g} } \right], $$
(16)

where n is the number of activation functions in af; m is the number of epochs in ep; k is the number of hidden layers in hl; q is the number of hidden units in hu; v is the number of batch sizes in bs; and g is the number of optimizers in op. The default hyperparameters are the first elements in each of the sets.

figure a

The development of the BGRU model for efficient IoT botnet detection in smart homes involves two main processes, namely, the choice of suitable deep network architecture and loss minimization through model training and validation. Deep network architecture (N) for BGRU is determined by af,c, hl,d, and hu,j. The selected BGRU architecture is trained with Xtr, ytr, Xva, yva, ep,α, bs,β, and op,γ using back-propagation through time (BPTT) algorithm [42]. Loss minimization during training and validation is assessed based on the values of training loss (ltr) and validation loss (lva). A categorical cross-entropy loss function was used for loss minimization in a multi-class classification scenario.

The performance of BGRU classifier is based on TPR, FPR, and MCC when evaluated with highly imbalanced testing data (Xte, yte). The definition of these performance metrics is given by (17)–(20) [43]:

$$ {\text{TPR}} = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} $$
(17)
$$ {\text{FPR}} = \frac{\text{FP}}{{{\text{FP}} + {\text{TN}}}}, $$
(18)
$$ \lambda = 2\left[ {\frac{{{\text{TP}} + {\text{FN}}}}{{{\text{TP}} + {\text{FN}} + {\text{FP}} + {\text{TN}}}}} \right] - 1 $$
(19)
$$ {\text{MCC}} = \frac{1}{2}\left\{ {\left[ {\frac{{{\text{TPR}} + {\text{TNR}} - 1}}{{{\text{TPR}} + \left( {1 - {\text{TNR}}} \right)\left( {\frac{1 - \lambda }{1 + \lambda }} \right)}}} \right] + 1} \right\}, $$
(20)

where true positive (TP) is the number of attack samples that are correctly classified; false positive (FP) is the number of normal network traffic samples that are misclassified as attacks; true negative (TN) is the number of normal network traffic samples that are correctly classified; false negative (FN) is the number of attack samples that are misclassified as normal network traffic; λ is the class imbalance coefficient. For each of the simulation scenarios, an optimal BGRU hyperparameter is expected to produce the lowest ltr, lva, and FPR as well as the highest TPR and MCC.

2.3 Deep BGRU Classifier for IoT Botnet Detection

Bot-IoT dataset [13] is made up of network traffic samples that were generated from real-life IoT testbed. The testbed is realistic because IoT devices were included. These IoT devices include a weather station, a smart fridge, motion-activated lights, a remote-controlled garage door, and an intelligent thermostat. This network of IoT devices is considered to be a good representation of an IoT-enabled smart home. The Bot-IoT dataset also contains recent and complex IoT botnet attack samples covering five common scenarios, namely, DDoS, DoS, reconnaissance, and information theft. Accurate ground truth labels are given to the IoT botnet attack samples. The samples of DDoS attack, DoS attack, normal traffic, reconnaissance attack, and information theft attack in the Bot-IoT dataset are 1,926,624, 1,650,260, 477, 91,082, and 79, respectively.

Network traffic samples, IoT botnet attack samples, and ground truth labels were pre-processed into appropriate formats suitable for deep learning. First, the complete dataset was randomly divided into a training set (70%), validation set (15%), and testing set (15%) as suggested in the literature [44, 45]. Non-numeric elements in feature matrices (Xtr, Xva, Xte) and label vectors (ytr, yva, yte) were encoded using integer encoding and binary encoding methods, respectively. Furthermore, the elements of feature matrices were transformed using a min–max normalization method such that the value of each element falls between 0 and 1 [46].

The proposed method for the selection of optimal BGRU hyperparameters was implemented. All simulations were performed at a learning rate of 0.0001. The default hyperparameters for the simulations were the ReLU activation function, five epochs, a single hidden layer, 200 hidden units, a batch size of 128, and Adam optimizer. We investigated the suitability of the following: tanh and ReLU activation functions; epochs of 5, 10, 15, and 20; hidden layers of 1, 2, 3, and 4; hidden units of 10, 50, 100, 150, and 200; batch sizes of 32, 64, 128, 256, and 512; and finally, the optimizers (Adam, SGD, RMSprop, and Adadelta). Model training, validation, and testing were implemented using Keras library developed for Python programming running on Ubuntu 16.04 LTS workstation with the following specifications: RAM (32 GB), processor (Intel Core i7-9700 K CPU @ 3.60 GHz × 8), Graphics (GeForce RTX 2080 Ti/PXCIe/SSE2), and 64-bit operating system. The optimal BGRU hyperparameters \( \left( {\tilde{a}_{f} , \tilde{e}_{p} , \tilde{h}_{l} , \tilde{h}_{u} , \tilde{b}_{s} , \tilde{o}_{p} } \right) \) were used to develop a multi-class classifier for efficient IoT botnet detection in smart homes.

3 Results and Discussion

In this section, we evaluate the effectiveness of the method proposed for the selection of optimal BGRU hyperparameters in our attempt to develop an efficient IoT botnet detection system for smart home network security. Specifically, we examine the influence of different activation functions, number of epochs, number of hidden layers, number of hidden units, batch sizes, and optimizers on the performance of BGRU-based multi-class classifier.

3.1 Influence of Activation Functions on Classification Performance

To determine the right activation function for multi-class classification, ReLU and tanh activation functions were independently employed in two distinct BGRU neural networks, namely, BGRU-ReLU and BGRU-tanh. Apart from the activation function that differs, each of the BGRU neural networks is made up of a single hidden layer with 200 hidden units. BGRU-ReLU and BGRU-tanh were separately trained and validated with five epochs, 128 batch size, and Adam optimizer.

Training and validation losses in BGRU-ReLU and BGRU-tanh were analyzed to understand the extent of model underfitting and overfitting, respectively. Figure 3 shows that the ReLU activation function is more desirable than tanh activation function. Generally, training and validation losses reduced in both BGRU-ReLU and BGRU-tanh as the number of epochs increased from 1 to 5. However, training and validation losses were lower in BGRU-ReLU than in BGRU-tanh throughout the 5-epoch period. At the end of the experiment, we observed that training loss in BGRU-ReLU reduced to 0.1000, while validation loss reduced to 0.0721. Also, BGRU-ReLU reduced the average training and validation losses in BGRU-tanh by 31.16% and 35.07%, respectively. The reduction in training and validation losses implies that the likelihood of model underfitting and overfitting is minimal when the ReLU activation function was used in BGRU. Model underfitting will lead to poor classification accuracy while model overfitting will adversely affect the generalization ability of BGRU classifier when applied to previously unseen network traffic samples. Consequently, the adoption of the ReLU activation function in BGRU will help to achieve high classification accuracy and good generalization ability required for efficient IoT botnet detection in smart homes.

Fig. 3
figure 3

Training and validation losses of two activation functions

Multi-class classification performance of BGRU-ReLU and BGRU-tanh was evaluated with respect to the ground truth labels based on TPR, FPR, and MCC. TPR, also known as sensitivity or recall, is the percentage of samples that were correctly classified; FPR, also known as fall-out or false alarm ratio (FAR), is the percentage of samples that were wrongly classified; and MCC is a balanced measure that accounts for the impact of class imbalance on classification performance. Table 1 shows that BGRU-ReLU performed better than BGRU-tanh. BGRU-ReLU increased TPR and MCC in BGRU-tanh by 24.37% and 25.47%, respectively, while FPR was reduced by 40.86%. The lower the TPR and MCC values, the higher the chances that the BGRU classifier will fail to detect IoT botnet in smart homes. Also, the higher the FPR value, the higher the probability that the BGRU classifier will produce a false alarm, i.e., it is more likely that the classifier will wrongly classify incoming network traffic or IoT botnet attack. Therefore, the choice of the ReLU activation function in BGRU will ensure a high detection rate and reduce false alarm in botnet detection system developed for smart homes.

Table 1 Performance of BGRU with different activation functions

3.2 Influence of the Number of Epochs on Classification Performance

In this subsection, we determine the optimal number of epochs required for efficient BGRU-based IoT botnet detection in smart homes. Four single layers BGRU neural networks were trained and validated with 5, 10, 15, and 20 epochs to produce BGRU-EP5, BGRU-EP10, BGRU-EP15, and BGRU-EP20 classifiers, respectively. Each of these classifiers utilized 200 hidden neurons, the ReLU activation function, a batch size of 128, and Adam optimizer.

Training and validation losses in BGRU-EP5, BGRU-EP10, BGRU-EP15, and BGRU-EP20 were analyzed to understand the extent of model underfitting and overfitting, respectively. Figure 4 shows that the lowest average training and validation losses were realized with 20 epochs. In general, training and validation losses reduced in all the classifiers throughout the epoch period. However, average training and validation losses were lower in BGRU-EP20 than in BGRU-EP5, BGRU-EP10, and BGRU-EP15. At the end of the experiment, we observed that training loss in BGRU-EP20 reduced to 0.0095, while validation loss reduced to 0.0094. BGRU-EP20 reduced the average training losses in BGRU-EP5, BGRU-EP10, BGRU-EP15 by 58.89%, 37.99%, and 17.87%, respectively, while the average validation losses were reduced by 53.80%, 35.22%, and 15.82%, respectively. The reduction in training and validation losses implies that the likelihood of model underfitting and overfitting is best minimized when the number of epochs in BGRU was 20. In other words, a sufficiently large number of epochs in BGRU will facilitate high classification accuracy and good generalization ability that are required for efficient IoT botnet detection in smart homes.

Fig. 4
figure 4

Mean training and validation loss of different epochs

Multi-class classification performance of BGRU-EP5, BGRU-EP10, BGRU-EP15, and BGRU-EP20 was evaluated with respect to the ground truth labels based on TPR, FPR, and MCC. Table 2 shows that BGRU-EP20 performed better than BGRU-EP5, BGRU-EP10, and BGRU-EP15. BGRU-EP20 increased TPR by 7.97%, 3.65%, and 1.05% relative to BGRU-EP5, BGRU-EP10 and BGRU-EP15, respectively; FPR decreased by 87.27%, 73.08%, and 46.15%, respectively; and MCC increased by 2.34, 1.05, and 0.32%. Therefore, a sufficiently large number of epochs in BGRU will ensure a high detection rate and reduce false alarm in the botnet detection system developed for smart homes.

Table 2 Performance of BGRU at different number of epochs

3.3 Influence of the Number of Hidden Layers on Classification Performance

In this subsection, we determine the optimal number of hidden layers required for efficient BGRU-based IoT botnet detection in smart homes. Four BGRU neural networks with 1, 2, 3, and 4 hidden layers formed BGRU-HL1, BGRU-HL2, BGRU-HL3, and BGRU-HL4 classifiers, respectively, when trained using 200 hidden neurons, ReLU activation function, five epochs, a batch size of 128, and Adam optimizer.

Training and validation losses in BGRU-HL1, BGRU-HL2, BGRU-HL3, and BGRU-HL4 were analyzed to understand the extent of model underfitting and overfitting, respectively. Figures 5 and 6 show that the lowest training and validation losses were realized with four hidden layers. In general, training and validation losses reduced in all the classifiers throughout the five-epoch period. However, training and validation losses were lower in BGRU-HL4 than in BGRU-HL1, BGRU-HL2, and BGRU-HL3. At the end of the experiment, we observed that training loss in BGRU-HL4 reduced to 0.0057, while validation loss reduced to 0.0060. BGRU-HL4 reduced the average training losses in BGRU-HL1, BGRU-HL2, and BGRU-HL3 by 79.22%, 45.93%, and 13.43%, respectively, while the average validation losses were reduced by 82.44%, 43.88%, and 4.67%, respectively. The reduction in training and validation losses implies that the likelihood of model underfitting and overfitting is best minimized when the number of hidden layers in BGRU was 4. In other words, a sufficiently deep BGRU will facilitate high classification accuracy and good generalization ability that are required for efficient IoT botnet detection in smart homes.

Fig. 5
figure 5

Training loss of different number of hidden layers

Fig. 6
figure 6

Validation loss of different number of hidden layers

Multi-class classification performance of BGRU-HL1, BGRU-HL2, BGRU-HL3, and BGRU-HL4 was evaluated with respect to the ground truth labels based on TPR, FPR, and MCC. Table 3 shows that BGRU-HL4 performed better than BGRU-HL1, BGRU-HL2, and BGRU-HL3. BGRU-HL4 increased TPR by 9.80, 1.71, and 1.69% relative to BGRU-HL1, BGRU-HL2, and BGRU-HL3, respectively; FPR decreased by 87.27%, 12.50%, and 0%, respectively; and MCC increased by 2.79%, 0.45%, and 0.44%. Therefore, a sufficiently deep BGRU will ensure a high detection rate and reduce false alarm in the botnet detection system developed for smart homes.

Table 3 Performance of BGRU at different number of hidden layers

3.4 Influence of Hidden Units on Classification Performance

In this subsection, we determine the optimal number of hidden units required for efficient BGRU-based IoT botnet detection in smart homes. Five single layers BGRU neural networks with 10, 50, 100, 150, and 200 hidden units formed BGRU-HU1, BGRU-HU2, BGRU-HU3, BGRU-HU4, and BGRU-HU5 classifiers respectively when trained using ReLU activation function, five epochs, a batch size of 128 and Adam optimizer.

Training and validation losses in BGRU-HU1, BGRU-HU2, BGRU-HU3, BGRU-HU4, and BGRU-HU5 were analyzed to understand the extent of model underfitting and overfitting, respectively. Figures 7 and 8 show that the lowest training and validation losses were realized with 200 hidden units. In general, training and validation losses reduced in all the classifiers throughout the five-epoch period. However, training and validation losses were lower in BGRU-HU4 than in BGRU-HU1, BGRU-HU2, and BGRU-HU3. At the end of the experiment, we observed that training loss in BGRU-HU4 reduced to 0.0486, while validation loss reduced to 0.0458. BGRU-HU5 reduced the average training losses in BGRU-HU1, BGRU-HU2, BGRU-HU3, and BGRU-HU4 by 53.54%, 31.72%, 21.14%, and 11.82%, respectively, while the average validation losses were reduced by 54.83%, 32.94%, 21.85%, and 12.07%, respectively. The reduction in training and validation losses implies that the likelihood of model underfitting and overfitting is best minimized when the number of hidden units in BGRU was 200. In other words, a sufficiently large number of hidden units in BGRU will facilitate high classification accuracy and good generalization ability that are required for efficient IoT botnet detection in smart homes.

Fig. 7
figure 7

Training loss of different number of hidden units

Fig. 8
figure 8

Validation loss of different number of hidden units

Multi-class classification performance of BGRU-HU1, BGRU-HU2, BGRU-HU3, BGRU-HU4, and BGRU-HU5 was evaluated with respect to the ground truth labels based on TPR, FPR, and MCC. Table 4 shows that BGRU-HU5 performed better than BGRU-HU1, BGRU-HU2, BGRU-HU3, and BGRU-HU4. BGRU-HU5 increased TPR by 53, 25.61, 1.75, and 0.58% relative to BGRU-HU1, BGRU-HU2, BGRU-HU3, and BGRU-HU4, respectively; FPR decreased by 56.69%, 25.68%, 17.91%, and 8.33%, respectively; and MCC increased by 64.86, 25.66, 0.56, and 0.19%. Therefore, a sufficiently large number of hidden units in BGRU will ensure a high detection rate and reduce false alarm in the IoT botnet detection system developed for smart homes.

Table 4 Performance of BGRU at different number of hidden units

3.5 Influence of Batch Size on Classification Performance

In this subsection, we determine the optimal batch size required for efficient BGRU-based IoT botnet detection in smart homes. Five single layers BGRU neural networks with batch sizes of 32, 64, 128, 256, and 512 formed BGRU-B32, BGRU-B64, BGRU-B128, BGRU-B256, and BGRU-B512 classifiers, respectively, when trained using 200 hidden units, ReLU activation function, five epochs, and Adam optimizer.

Training and validation losses in BGRU-B32, BGRU-B64, BGRU-B128, BGRU-B256, and BGRU-B512 were analyzed to understand the extent of model underfitting and overfitting, respectively. Figures 9 and 10 show that the lowest training and validation losses were realized with a batch size of 32. In general, training and validation losses reduced in all the classifiers throughout the five-epoch period. However, training and validation losses were lower in BGRU-B32 than in BGRU-B64, BGRU-B128, BGRU-B256, and BGRU-B512. At the end of the experiment, we observed that training loss in BGRU-B32 reduced to 0.0287, while validation loss reduced to 0.0262. BGRU-B32 reduced the average training losses in BGRU-B64, BGRU-B128, BGRU-B256, and BGRU-B512 by 18.49%, 35.38%, 49.58%, and 62.48%, respectively, while the average validation losses were reduced by 20.52%, 38.3%, 52.24%, and 64.55%, respectively. The reduction in training and validation losses implies that the likelihood of model underfitting and overfitting is best minimized when the batch size in BGRU was 32. In other words, a sufficiently small batch size in BGRU will facilitate high classification accuracy and good generalization ability that are required for efficient IoT botnet detection in smart homes.

Fig. 9
figure 9

Training loss of different batch sizes

Fig. 10
figure 10

Validation loss of different batch sizes

Multi-class classification performance of BGRU-B32, BGRU-B64, BGRU-B128, BGRU-B256, and BGRU-B512 was evaluated with respect to the ground truth labels based on TPR, FPR, and MCC. Table 5 shows that BGRU-B32 performed better than BGRU-B64, BGRU-B128, BGRU-B256, and BGRU-B512. BGRU-B32 increased TPR by 0.42%, 0.86%, 12.80%, and 26.49% relative to BGRU-B64, BGRU-B128, BGRU-B256, and BGRU-B512, respectively; FPR decreased by 35%, 52.73%, 60%, and 69.05%, respectively; and MCC increased by 0.21%, 0.42%, 3.84%, and 26.24%. Therefore, a sufficiently small batch size in BGRU will ensure a high detection rate and reduce false alarm in the botnet detection system developed for smart homes. Figure 11 shows that training time decreased as the batch size increased. BGRU-B32 took the longest time (101.35 min) to train, while the shortest training time of 6.70 min was achieved in BGRU-B512.

Table 5 Performance of BGRU at different batch sizes
Fig. 11
figure 11

Training time of different batch sizes

3.6 Influence of Optimizers on Classification Performance

In this subsection, we determine the most suitable optimizer required for efficient BGRU-based IoT botnet detection in smart homes. Four single layers BGRU neural networks with Adam, SGD, RMSprop, and Adadelta optimizers formed BGRU-OP1, BGRU-OP2, BGRU-OP3, and BGRU-OP4 classifiers, respectively, when trained using ReLU activation function, five epochs, and a batch size of 128.

Training and validation losses in BGRU-OP1, BGRU-OP2, BGRU-OP3, and BGRU-OP4 were analyzed to understand the extent of model underfitting and overfitting, respectively. Figures 12 and 13 show that the lowest training and validation losses were realized with Adam optimizer. In general, training and validation losses reduced in all the classifiers throughout the five-epoch period. However, training and validation losses were lower in BGRU-OP1 than in BGRU-OP2, BGRU-OP3, and BGRU-OP4. At the end of the experiment, we observed that training loss in BGRU-OP1 reduced to 0.0486, while validation loss reduced to 0.0458. BGRU-OP1 reduced the average training losses in BGRU-OP2, BGRU-OP3, and BGRU-OP4 by 86.54%, 4.22%, and 90.40%, respectively, while the average validation losses were reduced by 89.70%, 5.49%, and 92.31%, respectively. The reduction in training and validation losses implies that the likelihood of model underfitting and overfitting is best minimized when Adam optimizer was employed in BGRU. In other words, the use of Adam optimizer in BGRU will facilitate high classification accuracy and good generalization ability that are required for efficient botnet detection in smart homes.

Fig. 12
figure 12

Training loss of different optimizers

Fig. 13
figure 13

Validation loss of different optimizers

Multi-class classification performance of BGRU-OP1, BGRU-OP2, BGRU-OP3, and BGRU-OP4 was evaluated with respect to the ground truth labels based on TPR, FPR, and MCC. Table 6 shows that BGRU-OP1 performed better than BGRU-OP2, BGRU-OP3, and BGRU-OP4. BGRU-OP1 increased TPR by 200.67%, 40.74%, and 333.43% relative to BGRU-OP2, BGRU-OP3, and BGRU-OP4, respectively; FPR decreased by 94.84%, 16.67%, and 97.14%, respectively; and MCC increased by 224.96%, 30.78%, and 372.31%. Therefore, the adoption of Adam optimizer in BGRU will ensure a high detection rate and reduce false alarm in the botnet detection system developed for smart homes.

Table 6 Performance of BGRU for different optimizers

3.7 Performance of Deep BGRU-Based Multi-class Classifier

In this subsection, we evaluate the suitability of deep BGRU for IoT botnet detection in smart homes. A deep BGRU multi-class classifier was developed with the optimal hyperparameters in Sects. 3.13.6, namely, ReLU activation function, 20 epochs, 4 hidden layers, 200 hidden neurons, a batch size of 512, and Adam optimizer.

Training and validation losses in deep BGRU multi-class classifiers were analyzed to understand the extent of model underfitting and overfitting, respectively. Figure 14 shows that the training and validation losses were shallow when the optimal BGRU hyperparameters were used. Training and validation losses reduced throughout the five-epoch period. At the end of the experiment, we observed that training loss in deep BGRU multi-class classifiers reduced to 0.0018, while validation loss reduced to 0.0006. The reduction in training and validation losses implies that the likelihood of model underfitting and overfitting is minimized when the optimal hyperparameters were employed in BGRU. In other words, the choice of the optimal BGRU hyperparameters will facilitate high classification accuracy and good generalization ability that are required for efficient IoT botnet detection in smart homes. The time needed to train the optimal deep BGRU classifier was 33.95 min.

Fig. 14
figure 14

Training and validation losses of deep BGRU classifier

Multi-class classification performance of deep BGRU multi-class classifier was compared with the state-of-the-art methods based on TPR, FPR, and MCC. Tables 7, 8, and 9 show that deep BGRU multi-class classifier outperforms mixture localization-based outliers (MLO) [47], SVM [48], RF [49], artificial immune system (AIS) [50], and feedforward neural network (FFNN) [51]. Deep BGRU multi-class classifier achieved high detection accuracy and low false alarm with true positive rate (TPR), false positive rate (FPR), and Matthews coefficient correlation (MCC) of 99.28 ± 1.57%, 0.00 ± 0.00%, and 99.82 ± 0.40%.

Table 7 TPR of multi-class classifiers for IoT botnet detection in smart homes
Table 8 FPR of multi-class classifiers for IoT botnet detection in smart homes
Table 9 MCC of multi-class classifiers for IoT botnet detection in smart homes

4 Conclusion

In this paper, an optimal model was developed for efficient botnet detection in IoT-enabled smart homes using deep BGRU. A methodology was proposed to determine the optimal BGRU hyperparameters (activation function, epoch, hidden layer, hidden unit, batch size, and optimizer) for multi-class classification. The proposed methodology was implemented, and the classification performance was jointly assessed based on training loss, validation loss, accuracy, TPR, FPR, MCC, and training time. Extensive simulation results showed that: (a) ReLU performed better than tanh activation functions; (b) classification performance improved with an increase in the numbers of epochs, hidden layers, and hidden units; (c) the performance of BGRU improved as the batch size becomes smaller, but this comes with a significant increase in training time; (d) Adam outperformed SGD, RMSprop, and Adadelta optimizers. Finally, the combination of ReLU activation function, 20 epochs, 4 hidden layers, 20 hidden units, a batch size of 512, and Adam optimizer achieved the best multi-class classification performance as follows: low training loss (0.0107 ± 0.0219), validation loss (0.0072 ± 0.0086), FPR (0.00 ± 0.00%); high accuracy (99.99%), TPR (99.28 ± 1.57), and MCC (99.82 ± 0.40).