1 Introduction

The Internet has lately been one of the most advanced and powerful communication tools around the world. It is the main key factor that creates a motivating environment for innovation and creativity to build an innovative natural environment. To make such an environment requires consistent development in wireless technologies and installing sensors into various objects. This refers to the ability to allow each item to work independently while connected to the Internet. This concept is known as The Internet of Things (IoT) by Kevin Ashton [1]. IoT is the physical object network that allows specific objects to gather and share data with computers, instruments, cars, buildings, and other items embedded with processors, circuitry, applications, sensors, and network connectivity [2,3,4].

The Wireless Sensor Networks (WSN) provide the data's connectivity, captured by employing sensors and IoT devices, to record, monitor, and control various environmental conditions, such as water quality, temperature, air quality [5]. The WSN contains many sensors known as nodes, and each node has two tasks: data originator and data router. Each node contains four components: sensing, processing, transceiver, and a power source. However, these components are usually limited since each sensor has limited storage, low power, and limited processing capabilities. The WSN contains a sink node, which is also called the Base Station. The sink node collects all the data from the other sensor nodes, acting as the gateway between the sensor nodes and the data processing center [6]; Fig. 1 illustrates the components mentioned previously.

Fig. 1
figure 1

Wireless sensor networks (WSN) [6]

WSN is a great technology; it has some drawbacks, such as inadequate protection and performance problems, including memory insufficiency and sensor battery power, making sensor networks vulnerable to attacks [7]. Thus, the traditional security mechanisms are not enough to detect the WSNs intrusions. Several issues may threaten the security of WSNs, including data confidentiality, data authenticity, and data integrity. An intrusion detection system (IDS) is considered one of the critical methods for defending against hackers. It is a hot field of research, and researchers are trying to find new algorithms for inspecting all inbound and outbound activities to identify suspicious patterns. Intrusion detection monitors the events occurring in a computer system or network and analyzes them for signs of intrusions [8].

Detecting intrusion depends on understanding how the cyber-attack works. In most cases, such abnormal activity consumes network resources intended for specific uses and always affects the network's security and data. There are many types of cyber-attacks, such as device compromise, service disruption, data exfiltration, wrong data injection, and advanced, persistent threat to gain extended access to a device [9]. Also, IDS depends on other methods that see anomaly traffic (unusual traffic activities) using computer algorithms. Therefore, intrusion detection methodologies are classified into three major categories: Signature-based Detection (SD), Anomaly-based Detection (AD), and Stateful Protocol Analysis (SPA) [10].

Once the IDS has identified the dangerous or suspicious traffic, it blocks this activity by itself or by sending alerts to the intrusion prevention system (IPS) to secure the action or prevent intrusions. The IDS studies are classification tasks that separate the expected behavior of networks from attacks [11]. Thus, we need to use machine learning and data mining algorithms to accomplish this task since the attackers do not have a unique pattern and continuously use various tools and methods. Many techniques of machine learning have been used for intrusion detections like SVM and K-means [12, 13]. These techniques classify network connection data into two classes, regular or attacks, based on the connection's features. Before the classification step, it is essential to use optimization algorithms for feature selection. The feature selection process aims to pick up relevant components and exclude others that are irrelevant or redundant to increase the classification process's accuracy. Many models of optimization algorithms are inspired by nature, such as Grey Wolf Optimization (GWO) and Particle Swarm Optimization (PSO), Arithmetic Optimization Algorithm (AOA) [14]. These algorithms' general aim is to find the highest quality of solutions and the best convergence performance [15,16,17]. They must also have the exploration and exploitation features; the exploration technique covers all search space areas while the exploitation technique is used to find the optimal solutions within this region.

In wireless sensor networks, intrusion detection systems face many problems such as low detection accuracy, high false alarm alerts, and long processing time. These problems are caused by a vast amount of intrusion, wireless traffic collected by sensors, besides the fact that attackers do not have a unique pattern; they continuously use various tools and methods. In this paper, the protection system against intrusions is built based on selecting significant features that assist the classification process because of its effects, such as increased detection quality and accuracy and reduced execution time. The method of choosing features is mainly based on the GWO and PSO algorithms because it has great power in this area and can determine the relevant features that have to do with it. Still, just as everything has strengths, there are also weaknesses such as the pace of gradual convergence, low accuracy level, and so on. The hybridization with PSO is adopted to achieve the best next position to update each Grey Wolf location information and avoid the GWO algorithm from falling into a local optimum. Because of its capacity in finding the global optimum, convergence level, and simplicity. This paper focuses on extracting the optimal subset of features. The feature selection step aims to reduce the data dimensions by excluding the irrelevant or redundant features to enhance the accuracy and reduce the execution time, leading to an increase in the detection rate and a decrease in the false alarm rate. This was achieved using the proposed hybrid Gray Wolf Optimizer with Particle Swarm Optimizer, where it worked to reduce and choose features from a significant improvement in accuracy, detection rate, percentage of false alarms, and the speed of the entire process. The proposed method tackles the conventional Gray Wolf Optimizer's weaknesses by adding the search operators of the Particle Swarm Optimizer. This paper has achieved the goal of finding solutions to the challenges mentioned before. Finally, this work offers an excellent base for IDS' future study in many fields. The main contributions of this paper are listed as follows.

  1. 1

    A new optimization method is proposed for solving the intrusion detection problem in wireless sensor networks.

  2. 2

    The proposed method is based on using a hybrid search strategy utilizing the main operators of Gray Wolf Optimizer and Particle Swarm Optimizer.

  3. 3

    The proposed method is validated on benchmark data sets used in domain of intrusion detection systems.

  4. 4

    The results of the proposed method proved its ability to solve the intrusion detection problems compared with other methods.

The rest of the paper is organized as follows. Section 2 presents an overview of WSN, challenges, threats and attacks, WSNs protection, machine learning-based, and ML-based on the feature. In addition to discussing related studies. Section 3 presents the details of the proposed technique. In this chapter, the details of each stage of the proposed methods will be given. Section 4 provides details about the experimental results of the proposed technique, and it provides a comparison between the results of the proposed approach against some existing methods. Finally, Sect. 5 concludes this research.

2 Literature review

As mentioned earlier, WSNs consist of sensor nodes distributed in different places and are interconnected in a wireless network to collect information. The distributed nature and free wireless medium make WSNs vulnerable to security attacks at various levels. Self-organizing nature, low-battery power supply, limited bandwidth support, and dependency on other nodes are characteristics of sensor networks that expose them to many security attacks at all OSI model layers [18]. Sensor network security is a critical point in WSN. Confidentiality and privacy are necessary for sensitive information, for example, security data or military information. This network must have the capability of resisting separate attacks. One of the most severe challenges is how to protect the WSN since the wireless medium makes it easier for an attacker to spy on the traffic and cripple communication. A group of security issues and threats may face the WSNs [19, 20]; the following section summarizes several matters.

There are many types of IDs with different configurations that serve the same purpose of notifying the system or security administrator [21]. It provides reports about abnormal activities, and some IDS respond by preventing the threat or any attempts to attack. Most IDS define the threat using two commonly used methods, Signature-based Detection (SD) and Anomaly-based Detection (AD). It compares any packet received with this database to identify malicious behavior [22], while the second technique depends on behavioral models. They are based on processing types present in Fig. 2 like statistical-based, computer immunology, user intention identification, and machine learning-based [23].

Fig. 2
figure 2

Anomaly-based Detection (AD) [23]

Machine learning techniques emerged as the best solution to detect malicious patterns by teaching these patterns to the machine model, such as single classifiers, hybrid classifiers, and ensemble classifiers. Many classifier algorithms are used in the ML algorithm to classify data like K-Nearest Neighbor, Self-Organizing Maps, Decision Trees, Random Forest (RF), Naïve Bayes, Artificial Neural Networks (ANN), and Support Vector Machines (SVM) [24, 25].

An improved intrusion detection system is proposed in [19] using the NSL KDD dataset to measure the proposed methods. The developed method was introduced to pick features from the dataset. This was accomplished by increasing the number of wolves of the original algorithm. Two wolves were added to the original algorithm to become five wolves, and then on another experiment, four were added, so the number of wolves became seven. Then, classification was done using the SVM algorithm, and comparators were established to determine the efficiency by increasing the accuracy, detection rate, and decreasing the execution time, features a number, and the false alarm rate. The results showed that seven Wolves have the best results. For the intrusion detection method, Chahal & Kaur [26] suggested a hybrid solution to focus on classification, using the Adaptive-SVM algorithm and clustering, using the K-means algorithm. The proposed system solves these problems (high false-positive rate and low false-negative rate) and generates a better accuracy rate. The NSL-KDD dataset was used to evaluate the performance of this study.

To detect the intrusions and compare their results, Malviya & Jain [27] studied two decision tree classifiers (J48, Id3). The researchers used the attribute selection filter to implement the feature selection step in this study. KDDCUP 99 and a basic k-means algorithm were used for data analysis. The results showed that J48 has better classification accuracy with a high True Positive Rate (TPR) and low False Positive Rate (FPR) compared to ID3 decision tree classifiers. Shukla & Vashishtha [28] proposed a new hybrid intrusion detection system based on Data Mining Technique; the suggested method is combining three different data mining techniques to improve execution efficiency in Intrusion Detection System (IDS). The first stage clustered related data instances based on their behaviors by using clustering as a pre-classification component. The second stage grouped the resulting clusters into attack groups using the Apriori algorithm as a final classification task. The last step, the classification, is applied by using a Decision Tree. KDD'99 is used to calculate IDS efficiency. In terms of precision and performance, the proposed IDS performed better since the Proposed system can classify them into four categories: Probe, Denial of Service (DoS), U2R (User to Root), and R2L (Remote to Local).

An intrusion detection system is suggested in [29] using the MapReduce methodology, based on a parallel particle swarm optimization clustering algorithm. The PSO was used for the clustering task because it prevents sensitivity problems of initial cluster centroids and premature convergence. The results showed that the detection rate was better by keeping the false alarm very low, and the IDS was better at detection speed. KDD99 was used to evaluate the proposed system. The Intrusion Detection System (IDS) is presented using k-means to construct a higher-efficiency and lower-false alarm IDS using the NSL-KDD dataset [30]. The k-means clustering findings have shown that a higher performance rating is obtained when the correct number of clusters is implemented. Increasing or reducing the cluster relative to the number of data types would affect the model's efficiency. Therefore, defining the number of groups affects the findings dramatically. In the beginning, one must know how many sets are required to attain accurate results. Based on the various types of data, 22 groups were used in this model. In a complex network, it would be difficult to identify the number of clusters since there is no "ground data" to act as the basis for determining the number of groups.

Li and Xu in [31] suggested a K-means clustering algorithm and optimization of particle swarm (PSO-KM). Anomaly Intrusion Detection System Experiments on KDD CUP 99 datasets. They revealed the proposed solution's effectiveness, its high detection rate, and low false detection rate. The PSO-KM algorithm combines the particle swarm optimization algorithm with the traditional K-means clustering algorithm: it has the best overall optimization potential. The results illustrate that the PSO-KM algorithm is an effective method when dealing with large datasets. Experimental results show that the detection rate of PSO-KM is improved to detect both known and unknown attacks. It enhances the implementation value of the K-means clustering algorithm in intrusion detection. The proposed technique achieved good results in K-means or SVM in terms of detection rate, accuracy, false alarm rate, number of features, and the whole process time. Table 1 illustrates the differences between the relevant studies mentioned in this section with the proposed technique. We concluded that the current methods are usually used optimization methods to solve the IDS problems. It is clear that the need for a new effect method is essential to solve the IDS problems as mentioned above.

Table 1 Result of related studies

3 The proposed method

The mission of IDS is essential to detect malicious activities. It shows how machine learning techniques are powerful in processing a massive amount of wireless intrusion traffic to classify abnormal and regular traffic. It was also previously mentioned how the features selection improves the classifier algorithms in terms of the processing time and the accuracy of the detection rate by reducing the number of features. In this paper, the features selection step was done by using the optimization algorithms. The definition of an optimization algorithm is how built a computer program that can generate models procedurally with the ability to change several parameters to minimize or maximize an objective. Furthermore, the main challenge is identifying the parameters that formulate the computer's problem and design a robust optimization algorithm to find the best design. The main components of an optimization problem, shown in Fig. 3: set of decision variables, an objective function, bounds on the decision variables, and constraints.

Fig. 3
figure 3

Component of an optimization problem [32]

As shown in Fig. 4, PSO and GWO are used in the proposed technique, and they belong to the same family of population-based algorithms, called swarm-based algorithms [33]. To achieve the main objective of this paper, a quantitative research methodology is used. The dataset was imported from the NSL-KDD dataset that is available online (https://www.unb.ca/cic/datasets/nsl.html). Features extraction step using the proposed technique was based on the hybridized algorithm. Finally, the classification process was applied based on K-means and SVM to divide the features (variables of a dataset) into separate groups: regular or attack.

Fig. 4
figure 4

The Proposed IDS method

3.1 NSL-KDD dataset

The NSL KDD dataset is an updated version of the KDD cup99 data set, which is suggested to solve the previous version's problems. It has several advantages over the original KDD data set [34]; a sufficient number of records will be available in the train and test datasets. There are no duplicate records in the test sets, and it does not include redundant records in the train set. There are 41 types of features in each record, and those are considered either attack type or regular type. Each feature is categorized into three attribute value types. (Nominal, Binary, and Numeric). Figure 5 shows the 41 features of the NSL-KDD dataset.

Fig. 5
figure 5

Features of NSL-KDD dataset

These types of attack classes are categorized into four parts; DoS- Denial of service, Probing- Surveillance and other probing attacks, U2R- Unauthorized access to local superuser, and R2L- Unauthorized access from a remote machine. Table 2 shows the types of attacks as per the above categorization [35]. Table 3 shows the distribution of the ordinary and attack records in the various NSL-KDD datasets [35].

Table 2 Types of attacks
Table 3 Distribution of the normal and attack records

3.2 Preparing dataset

This section presents a data preparation and preprocessing framework for producing qualitative data for experimental analysis. The experimental study was conducted here on the intrusion detection data. The following two subsections illustrate the main phases of the dataset's preprocessing (Data Transformation and normalization).

3.2.1 Data transformation

All nominal attributes are converted to a numeric value in the data transformation stage. For example: To convert the original values to numerical values such as tcp = 1, udp = 2, and icmp = 3, the protocol type attribute is given as an integer number. As shown in Fig. 6, the same transformation technique is adopted to convert nominal values [19].

Fig. 6
figure 6

Transform Methodology [19]

3.2.2 Normalization

Values are scaled in the data normalization process using Eq. (1) since the NSL KDD dataset attributes are not distributed uniformly.

$$X^{\prime } = {{\left( {Original\;value - Min\_Value} \right)} \mathord{\left/ {\vphantom {{\left( {Original\;value - Min\_Value} \right)} {\left( {Maxvalue - Min\_Value} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {Maxvalue - Min\_Value} \right)}}$$
(1)

where X′ is the normalized value [36]. Figure 7 shows the data set before the normalization phase, and Fig. 8 shows the dataset after the normalization phase.

Fig. 7
figure 7

Dataset before normalization phase

Fig. 8
figure 8

Dataset after normalization phase

3.3 The proposed feature selection method

In this process, the feature selection algorithm is built based on hybridizing the Gray Wolf Optimizer (GWO) with Particle Swarm Optimizer (PSO).

3.3.1 Grey wolf optimization (GWO)

This algorithm was introduced by Mirjalili in [37] and inspired by the nature of wolves. It mirrors the behavior and the hunting strategies of the grey wolves and works on leadership hierarchy hunting strategies. Alpha (α) leads the group, while Beta (β) is the second group and assists the Alpha group. The next level of the hierarchy contains Delta (δ) and Omega (ω) wolves. Delta wolves follow the upper level of the hierarchy and control the Omega wolves. Figure 9 illustrates the hierarchy of a grey wolf.

Fig. 9
figure 9

Hierarchy of grey wolf

The main phases of grey wolf hunting are as follows [37]:

  • Tracking, chasing and approaching the prey.

  • Pursuing, encircling, and harassing the prey until it stops moving.

  • Attack towards the prey.

Figure 10 presents the hunting behavior of grey wolves: (A) chasing, approaching, and tracking prey (B–D) pursuing, harassing, and encircling (E) stationary situation and attack.

Fig. 10
figure 10

Hunting behavior of grey wolves [37]

The exploration search is done by the upper three levels of wolves and aims to find the best position. The following Eqs. (212) describe the grey wolf surrounding the prey [37].

$$\overrightarrow{{\varvec{D}}}=\left|\overrightarrow{{\varvec{C}}}.\right.\left.{\overrightarrow{{\varvec{X}}}}_{{\varvec{P}}}\left({\varvec{t}}\right)-\overrightarrow{{\varvec{X}}}\left({\varvec{t}}\right)\right|$$
(2)
$$\overrightarrow{{\varvec{X}}}\left({\varvec{t}}+1\right)={\overrightarrow{{\varvec{X}}}}_{{\varvec{P}}}\left({\varvec{t}}\right)-\overrightarrow{{\varvec{A}}}.\overrightarrow{{\varvec{D}}}$$
(3)

where, \(\left({\varvec{t}}\right)\) Is the number of the current iteration, \({\overrightarrow{{\varvec{x}}}}_{{\varvec{p}}}\) is the position vector of the prey, \(\overrightarrow{{\varvec{x}}}\) is the position vector of a grey wolf, and \(\overrightarrow{{\varvec{A}}}\) And \(\overrightarrow{{\varvec{C}}}\) are coefficient vectors and they are calculated by:

$$\overrightarrow{{\varvec{A}}}=2\overrightarrow{{\varvec{a}}}\cdot {\overrightarrow{{\varvec{r}}}}_{1}- \overrightarrow{{\varvec{a}}}$$
(4)
$$\overrightarrow{{\varvec{C}}}=2\cdot {\overrightarrow{{\varvec{r}}}}_{2}$$
(5)

where, \(\overrightarrow{{\varvec{a}}}\) is the exploration rate (linearly decreased from 2 to 0 over the course of iterations), and \({\overrightarrow{{\varvec{r}}}}_{1}\) and \({\overrightarrow{{\varvec{r}}}}_{2}\) are random vectors in [0, 1].

Figure 11 illustrates the possible areas in which the wolf moves and updates its positions according to the position of the prey. At each iteration, values of \(\overrightarrow{A}\) and \(\overrightarrow{C}\) update the position of grey wolf, in the same figure, the 3-dimensional position update of the grey wolf can be seen. Using \({\overrightarrow{r}}_{1}\) and \({\overrightarrow{r}}_{2}\), the grey wolf can update its position to any random position by Eqs. 2 and 3. When |A| becomes less than 1, the grey wolf attacks the prey, but the random numbers \({\overrightarrow{r}}_{1}\) and \({\overrightarrow{r}}_{2}\) may cause the grey wolf to fall into local optimal [38].

Fig. 11
figure 11

Position vectors and their possible next locations [37]

The hunting phase is guided by Alpha (α), Beta (β) and Delta (δ), which may also be involved in hunting occasionally. Until now we don’t know the location of prey. Therefore, the best candidate solution Alpha, Bata and Delta have been assumed to have good knowledge about the position of the prey [37].

$${\overrightarrow{{\varvec{D}}}}_{\boldsymbol{\alpha }}=\left| \right.\left.{\overrightarrow{{\varvec{C}}}}_{1}.{\overrightarrow{{\varvec{X}}}}_{\boldsymbol{\alpha }}-\overrightarrow{{\varvec{X}}}\right|$$
(6)
$${\overrightarrow{{\varvec{D}}}}_{{\varvec{\beta}}}=\left| \right.\left.{\overrightarrow{{\varvec{C}}}}_{2}.{\overrightarrow{{\varvec{X}}}}_{{\varvec{\beta}}}-\overrightarrow{{\varvec{X}}}\right|$$
(7)
$${\overrightarrow{{\varvec{D}}}}_{{\varvec{\delta}}}=\left| \right.\left.{\overrightarrow{{\varvec{C}}}}_{3}.{\overrightarrow{{\varvec{X}}}}_{{\varvec{\delta}}}-\overrightarrow{{\varvec{X}}}\right|$$
(8)

After obtaining the above position vector, the wolves will perform the last update by adopting the following [37].

$${\overrightarrow{{\varvec{X}}}}_{1=}{\overrightarrow{{\varvec{X}}}}_{\boldsymbol{\alpha }}-{\overrightarrow{{\varvec{A}}}}_{1}.({\overrightarrow{{\varvec{D}}}}_{\boldsymbol{\alpha }})$$
(9)
$${\overrightarrow{{\varvec{X}}}}_{2=}{\overrightarrow{{\varvec{X}}}}_{{\varvec{\beta}}}-{\overrightarrow{{\varvec{A}}}}_{2}.({\overrightarrow{{\varvec{D}}}}_{{\varvec{\beta}}})$$
(10)
$${\overrightarrow{{\varvec{X}}}}_{3=}{\overrightarrow{{\varvec{X}}}}_{{\varvec{\delta}}}-{\overrightarrow{{\varvec{A}}}}_{3}.({\overrightarrow{{\varvec{D}}}}_{{\varvec{\delta}}})$$
(11)
$$\overrightarrow {\mathbf{X}} \left( {\mathbf{t} + 1} \right)~ = ~\frac{{\overrightarrow {\mathbf{X}} _{1} + \overrightarrow {\mathbf{X}} _{2} \mathbf{ + }\overrightarrow {\mathbf{X}} _{3} }}{3}$$
(12)

During each iteration update, the grey wolf's position is estimated by the best three levels of positions. \(\overrightarrow{X}\left({t+1}\right)\) is the updated position of the next generation of wolves. Each candidate solution will update the distance between them and the prey. Figure 12 presents the Pseudo code, and Fig. 13 shows the Grey Wolf Optimization (GWO) algorithm's flowchart.

Fig. 12
figure 12

Pseudo code of the GWO algorithm

Fig. 13
figure 13

Flowchart of Grey Wolf Optimization (GWO) algorithm [37]

3.3.2 Particle swarm optimization (PSO)

This algorithm was introduced by Kennedy and Eberhart in 1995 [39]. It was inspired by how birds flock while searching for food. Once one bird finds the food, it would send a message to the remaining birds to keep them updated about the fresh food’s position. The PSO algorithm aims to find global optimization; every swarm bird is called a particle [15]. The main advantage of PSO is that it has fewer parameters to adjust and fast convergence. It comprises the following steps [40]; initialize each particle with random position and velocity, evaluate the fitness of each particle, update \({P}_{best}\) and \({G}_{best}\) of each particle, update velocity of each particle using Eq. (13), and update position of each particle using Eq. (14). Figure 14 shows a flowchart of the PSO algorithm.

Fig. 14
figure 14

Flowchart of PSO algorithm [41]

$${{V}_{p }}^{\left(t+1\right)}=\mathrm{w}.{{V}_{p }}^{(t)}+{c}_{1}*{r}_{1}*\left({P}_{best }-{{x}_{p }}^{\left(t\right)}\right)+{c}_{1}*{r}_{1}*\left({G}_{best }-{{x}_{p }}^{\left(t\right)}\right)$$
(13)
$${{x}_{p }}^{\left(t+1\right)}={{x}_{p }}^{\left(t\right)}+{{V}_{p }}^{\left(t+1\right)}$$
(14)

3.3.3 The proposed technique based on hybrid GWO-PSO algorithm

As mentioned earlier, GWO is strong at exploitation but weak at avoiding a premature convergence and local optimum. The PSO algorithm has a solid exploration capability, but it lacks exploitation. In this section, the GWO-PSO hybrid algorithm is proposed to combine the GWO exploitation capability with the PSO exploration capability to obtain a better global optimization capability. Hybridization is to acquire the balance between exploitation and exploration to extract the optimal subset of features and reduce the data dimensions by excluding the irrelevant or redundant features. The hybrid GWO-PSO is proven as an effective optimization technique when seeking the global best solution to an optimization problem.

Figure 15 shows the flowchart of the GWO-PSO algorithm. The proposed technique consists of the following steps; initialization of the search agents and defining the solution area, running the GWO technique, generating the lowest values for all agents, passing these agents to the PSO technique as initial points, returning the modified positions to the GWO, and repeating these steps until the stopping criteria are reached.

Fig. 15
figure 15

Flowchart of GWO-PSO algorithm

GWO-PSO alternately uses the PSO algorithm for exploration in the search space and the GWO algorithm for exploitation to search the global optimum without changing the general operation of the GWO. In order to do this hybridization technique, the updated position of the next generation of Wolves will perform the last update by adopting the Eq. (15) Instead of using Eq. (12) in GWO algorithm.

$$\mathop{X}\limits^{\rightharpoonup} \left( {t + 1} \right) = \mathop{X}\limits^{\rightharpoonup} \left( {t + 1} \right) + V_{p}^{{\left( {t + 1} \right)}}$$
(15)

3.4 Classification process

3.4.1 K-means technique

K-means is an unsupervised machine learning algorithm used to cluster and analyze the data, first introduced by James MacQueen in 1967 [42]. It aims to divide the features (attributes of a dataset: the output of the selection feature process in this study) into separate groups or clusters. It seeks to classify the given dataset into a certain number (k) of sets based on the similarity degree. It attempts to make the data within a cluster as similar as possible while maintaining a low degree of similarity between groups and reducing the data's complexity. Moreover, K-means is very useful and famous in the data mining field. It has many advantages, including its simplicity in implementation, efficiency, and low memory consumption compared to other clustering techniques [43]. Clustering refers to an aggregated number of points together because of some similarities. The 'means' refers to averaging the data. Finally, K refers to the number of centroids you need in the dataset.

As shown in Fig. 16, to proceed with the clustering process, the first step should identify the K initial centroids. It is well known that K-means are used for the clustering process. However, in our case, it is used as a classifier since the clusters are known, customary, and attacks; therefore, the K-means separates the dataset into two classes depending on the distance between the data, so the K value should be two, referring to the number of clusters; either normal or abnormal (Attacks), For example, a higher average in the number of packets can be taken as an indicator for a strange cluster. Then, iterative calculations are performed to centroids until they have stabilized. No additional change in centroid values occurs either because the clustering has been successful or the defined number of iterations has been achieved, as shown in Fig. 17.

Fig. 16
figure 16

Flowchart of K-means [44]

Fig. 17
figure 17

Attack detection using k-means algorithm

3.4.2 SVM

Support Vector Machine (SVM) is a supervised machine learning algorithm used for either classification or regression, developed in [45]. SVM aims to determine an optimal separating hyperplane (OSH). It is mainly used for classification issues. We map each data object in the SVM algorithm as a point in n-dimensional space (where n is the number of features). Then, the classification is performed by finding the hyper-plane that differentiates the two classes, as shown in Fig. 18. SVM is preferable by many as it provides high accuracy with less processing power [46]. Therefore, in this paper, another experiment is done using the SVM technique to measure classification performance.

Fig. 18
figure 18

SVM algorithm [47]

4 Experimental results

The experimental results of the proposed technique will be discussed in this chapter. Also, the assessment metrics mention in Sect. 1.7 used to evaluate the performance of the proposed method. MATLAB R2020b is used to implement the proposed approach, a powerful computational package based on a proprietary computational language that provides tools for users with a wide range of programming knowledge. The software package will direct the project from end-to-end, from graphical user interfaces that can run the experiment to real-time data collection, analysis, and data production. MATLAB can perform calculations based on a large dataset that would be time-prohibitive in conventional statistical rows and column packs. Table 4 shows the environment in which these experiments were applied.

Table 4 Environment Specifications

4.1 Testing and analysis

This section provides a detailed evaluation and comparison of the proposed technique GWO-PSO, GWO, and PSO used to select the relevant feature. The K-means and SVM algorithms were used in the classification process to illustrate the extent of improvement based on the proposed technique. The measurement is done on the assessment metrics mention in Sect. 1.7. The below equations are used to calculate the accuracy, detection rate, and false alarm rate [19]. Figure 19 shows the confusion chart values.

Fig. 19
figure 19

Confusion chart

$$Accuracy=\left(\frac{true\, positive+true\, negative}{true\, positive+true\, negative+false\, positive+false\, negative}\right)$$
(16)
$$Detection Rate=\left(\frac{true\, positive}{true \,positive+false\, negative}\right)$$
(17)
$$False alarm=\left(\frac{false\, positive}{true\, negative+false\, positive}\right)$$
(18)

4.2 Results and discussion

In this section, the proposed technique was used to select the relevant features as shown in Table 5; 20 components were chosen out of 41 features in the original datasets mentioned in Sect. 3.2, representing 48.78% of the total features. The bold font in the given tables refer to the best result.

Table 5 List of selected features by GWO-PSO

Figure 20 shows the confusion chart for K-means results after using the appropriate features that resulted from the proposed GWO-PSO technique. The proposed method predicts 9455 records as regular records out of 9711 regular records, representing 97.363% as True Negative (TN). The proposed approach indicates 256 records as attack records out of 9711 regular records, representing 2.636% as False Positive (FP). The proposed technique predicts 5497 records as regular records out of 12,833 attack records, representing 42.834% as False Negative (FN). The proposed approach indicates 7336 records as attack records out of 12,833 attack records, representing 57.165% as True Positive (TP).

Fig. 20
figure 20

Confusion chart of K-means using GWO-PSO selected features

Figure 21 shows the confusion chart for SVM results after using the appropriate features that resulted from the proposed GWO-PSO technique. The proposed method predicts 9660 records as regular records out of 9711 regular records, representing 99.475% as True Negative (TN). The proposed approach indicates 51 records as attack records out of 9711 regular records, representing 0.525% as False Positive (FP). The proposed technique predicts 182 records as regular records out of 12,833 attack records, representing 1.418% as False Negative (FN). The proposed approach indicates 12,651 records as attack records out of 12,833 attack records, representing 98.582% as True Positive (TP). Table 6 shows the results based on the assessment metrics: accuracy, detection rate, false alarm, process time, and feature number.

Fig. 21
figure 21

Confusion chart of SVM using GWO-PSO selected features

Table 6 The results of proposed technique based on the assessment metrics

The GWO algorithm was used in this section to select the relevant features as shown in Table 7; 26 features were chosen out of 41 features in the original datasets that were mentioned in Sect. 3.2, which represent 63.414% of the total features.

Table 7 List of selected features by GWO

Figure 22 shows the confusion chart for K-means results after using the appropriate features that resulted from the GWO algorithm. The proposed technique predicts 9493 records as regular records out of 9711 regular records, representing 97.755% as True Negative (TN). The proposed approach indicates 218 records as attack records out of 9711 regular records, representing 2.245% as False Positive (FP). The proposed technique predicts 6061 records as regular records out of 12,833 attack records, representing 47.230% as False Negative (FN). The proposed approach indicates 6772 records as attack records out of 12,833 attack records, representing 52.770% as True Positive (TP).

Fig. 22
figure 22

Confusion chart of K-means using GWO selected features

Figure 23 shows the confusion chart for SVM results after using the appropriate features that resulted from the GWO algorithm. The proposed technique predicts 9654 records as regular records out of 9711 regular records, representing 99.413% as True Negative (TN). The proposed method indicates 57 records as attack records out of 9711 regular records, representing 0.587% as False Positive (FP). The proposed technique predicts 288 records as regular records out of 12,833 attack records, representing 2.244% as False Negative (FN). The proposed approach indicates 12,545 records as attack records out of 12,833 attack records, representing 97.756% as True Positive (TP). Table 8 shows the results based on the assessment metrics: accuracy, detection rate, false alarm, process time, and feature number.

Fig. 23
figure 23

Confusion chart of SVM using GWO selected features

Table 8 The results of GWO algorithm based on the assessment metrics

The PSO algorithm was used to select the relevant features as shown in Table 9; 24 features were chosen out of 41 features in the original datasets mentioned in Sect. 3.2, representing 58.536% of the total features. Figure 24 shows the confusion chart for K-means results after using the appropriate features that resulted from the PSO algorithm. The proposed technique predicts 9483 records as regular records out of 9711 regular records, representing 97.652% as True Negative (TN). The proposed method indicates 228 records as attack records out of 9711 regular records, representing 2.348% as False Positive (FP). The proposed technique predicts 5789 records as regular records out of 12,833 attack records, representing 45.110% as False Negative (FN). The proposed method indicates 7044 records as attack records out of 12,833 attack records, representing 54.890% as True Positive (TP).

Table 9 List of selected features by PSO
Fig. 24
figure 24

Confusion chart of K-means using PSO selected features

Figure 25 shows the confusion chart for SVM results after using the appropriate features that resulted from the PSO algorithm. The proposed technique predicts 9666 records as normal records out of 9711 normal records, representing 99.537% as True Negative (TN). The proposed technique predicts 45 records as attack records out of 9711 normal records, representing 0.463% as False Positive (FP). The proposed technique predicts 287 records as normal records out of 12,833 attack records, representing 2.236% as False Negative (FN). The proposed technique predicts 12,546 records as attack records out of 12,833 attack records, representing 97.764% as True Positive (TP). Table 10 shows the results based on the assessment metrics: accuracy, detection rate, false alarm, process time, and feature number.

Fig. 25
figure 25

Confusion chart of SVM using PSO selected features

Table 10 The results of PSO algorithm based on the assessment metrics

In terms of feature numbers, a comparison of GWO-PSO, GWO, and PSO is made. The proposed technique achieved the study's objective by reducing the number of features and selecting the relevant features. 20 relevant features were chosen from GWO-PSO. In comparison, 24 and 26 relevant features were selected from PSO and GWO, respectively, as shown in Fig. 26.

Fig. 26
figure 26

Selected number of features by the comparative methods

Table 11 provides a comparison of the GWO-PSO-K-means, GWO-K-means, and PSO-K-means algorithms in terms of accuracy, detection rate, and false alarm and process time. It is clear that the proposed GWO-PSO-K-means got a smaller number of features with a higher accuracy rate compared to other methods. As seen in Table 11, except for the false alarm rate relative to GWO, the proposed technique achieved the target by enhancing the GWO with K-means. The proposed method's accuracy has reached 74.48% compared with GWO and PSO, which gained 72.15% and 73.31%, respectively, as shown in Fig. 27. The proposed technique achieved 57.17% in terms of detection rate, while GWO and PSO reached 52.77% and 54.89%, respectively, as shown in Fig. 28. Also in this figure, the proposed GWO-PSO-K-means got better and higher accuracy rate compared to other methods.

Table 11 Comparison of GWO-PSO-K-means, GWO-K-means and PSO-K-means algorithms
Fig. 27
figure 27

Accuracy results of the comparative methods using K-means

Fig. 28
figure 28

Detection Rate of the comparative methods using K-means

In terms of False Alarm, the proposed technique achieved 2.64% compared to GWO, which reached 2.24%, and PSO reached 2.35%, as shown in Fig. 29. In terms of processing time, the PSO achieved the best time, following by the GWO-PSO-K-means, and the GWO gives the longest process time, as shown in Fig. 30. Table 12 shows a comparison of GWO-PSO-SVM, GWO-SVM, and PSO-SVM algorithms in terms of accuracy, detection rate, false alarm, process time. As seen in Table 12, the proposed technique achieved the objective of enhancing the GWO and PSO with SVM, except for the false alarm rate relative to PSO. The proposed method achieved 98.97%, while GWO achieved 98.47%, and PSO gave 98.52% accuracy, as shown in Fig. 31.

Fig. 29
figure 29

False Alarm of the comparative methods using K-means

Fig. 30
figure 30

Process Time (seconds) of the comparative methods using K-means

Table 12 Comparison of GWO-PSO-K-means, GWO-SVM and PSO-SVM algorithms
Fig. 31
figure 31

Accuracy of the comparative methods using SVM

The proposed technique achieved 98.58%, while GWO and PSO achieved 97.76% in detection rate, as seen in Fig. 32. Compared to GWO that reached 0.59%, the proposed technique gained 0.53%, and PSO achieved 0.46% in terms of false alarm terms, as seen in Fig. 33. In terms of processing time, the proposed technique achieved the best time, then the PSO, and finally, the GWO, as shown in Fig. 34.

Fig. 32
figure 32

Detection Rate of the comparative methods using SVM

Fig. 33
figure 33

False Alarm of the comparative methods using SVM

Fig. 34
figure 34

Process Time of the comparative methods using SVM

Table 13 summarized all experiments results of GWO-PSO, GWO, and PSO using classification algorithms SVM and K-means in terms of accuracy, detection rate, false alarm, and process time. The proposed GWO-PSO-K-means got a smaller number of selected features similar to the proposed GWO-PSO-SVM. But, GWO-PSO-SVM got better results in terms of accuracy compared to all other methods in Table 13. Also in this table, other comparisons using the state-of-the-art methods published in the literature (i.e., Machine learning belief networks [48], Deep belief networks [49], and KELM [50]) are conducted to validate the performance of the proposed method. It is clear that the proposed method got better results compared to all other comparative methods. Which proved the ability of the proposed method in selecting the optimal features to determine the intrusion data.

Table 13 The results of the proposed methods in terms of several evaluation measures

The proposed technique's enhancement ratio on GWO-Kmeans, PSO-Kmeans, GWO-SVM, and PSO-SVM is given in this part of the results. The results are summarized in Tables 14 and 15. All values in the below tables are determined using this equation [51]: Enhancement percentage = (|Old value—New value|)/Old value as given in Eq. (19). As shown in Table 14, the proposed technique enhanced the GWO-Kmeans by 3.229% accuracy, 8.338% in terms of detection rate, 16.186% in terms of processing time, and 23.076% in terms of feature number. Besides, the proposed technique enhanced the PSO-Kmeans by 1.596% in terms of accuracy, 4.153% in terms of detection rate, and 16.666% in terms of feature number; Fig. 35 illustrate these ratios.

Table 14 The enhancements of GWO-PSO-Kmeans
Table 15 The enhancements of GWO-PSO-SVM
Fig. 35
figure 35

The enhancements of GWO-PSO-Kmeans

As shown in Table 15, the proposed technique enhanced the GWO-SVM by 0.507% in terms of accuracy, 0.838% in terms of detection rate, 10.169% in terms of the false alarm, 27.518% in terms of processing time, and 23.076% in terms of feature number. Besides, the proposed technique enhanced the PSO- SVM by 0.456% in terms of accuracy, 0.8388% in terms of detection rate, 9.730% in terms of processing time, and 16.66% in terms of feature number. These ratios are illustrated in Fig. 36. It is clear that the proposed method got better results and also new best solutions for the given problems.

Fig. 36
figure 36

The enhancements of GWO-PSO-SVM

5 Conclusions and future works

In this paper, GWO was improved by using hybridization with the PSO algorithm. Therefore, this improvement would be reflected in the level of protection of the IDS. The NSL KDD dataset was used to test the proposed technique in terms of accuracy, detection rate, false alarm rate, processing time, and the number of features. The results have shown this improvement by selecting the relevant features that have improved the classification process, whether K-means or SVM classification. The proposed technique was compared with the original PSO and GWO separately to measure this improvement. The results demonstrated that the proposed method outperforms the original PSO and GWO in terms of accuracy, detection rate, and the number of features. This technique enhanced the GWO-K means by 3.23% accuracy, 8.34% detection rate, 16.19% processing time, and 23.08% feature number. It also improved the PSO-Kmeans by 1.6% accuracy, 4.15% detection rate, and 16.67% feature number. For the SVM algorithm, the proposed technique enhanced the GWO-SVM by 0.51% accuracy, 0.84% detection rate, 10.17% false Alarm, 27.52% processing time, and 23.08% feature number. The proposed technique enhanced the PSO- SVM 0.46% accuracy, 0.84% detection rate, and 16.67% feature number. In the future, the Bagging (Bootstrap Aggregating) algorithm could be a classifier instead of K-means. It is one of the ensemble learning methods and can improve regression and classification accuracy to increase the detection rate in the WSN environment, especially IDS. Other optimization algorithms can solve the same problem, such as Arithmetic Optimization Algorithm (AOA). In future work, the proposed method can be applied to solve other optimization problems such as data mining problems, task scheduling problems, wind energy problems, industrial engineering problems, benchmark function problems, feature selection problems, image segmentation problems and others.