Abstract
Intrusion Detection Model (IDM) is an essential device for network defence in current trend. Malicious users analyse the vulnerabilities of IDSs to capture unauthorized access. Furthermore, intrusion detection encompasses numerous numerical attributes and models, resulting in elevated detection errors and triggering false alarms. Hence, optimal computational intelligence shall be incorporated in IDM to achieve high detection rate and less number of false alarms. Considering the same, a new hybrid IDM framework is developed as the combination of Fuzzy Genetic Algorithm with Multi-Objective Particle Swarm Optimization that maximizes the detection accuracy, minimizes the false alarms and takes less computational complexity which will be explained first phase. The existing IDSs are constraint to the information trained incur into false positives based on user continuity for normal activity. The objective of this proposal is to extract optimal classification rules automatically from training data that helps to identify types of attacks correctly including the unknown attack types. For achieving this goal, Multi-Objective Particle Swarm Optimization (MOPSO) is used as classifier to enhance the identification of the rare attack classes within the IDM. The effectiveness of this method lies in its capacity to leverage information within an unfamiliar search space, guiding subsequent searches towards valuable subspaces. It provides better separability of various classes’ i.e. normal behaviour and false alarms. In this FGA-MOPSO model, Principal Component Analysis (PCA) serves as the feature selection technique employed to identify pertinent features within the dataset, thereby enhancing the classifier’s performance and Fuzzy Genetic Algorithm (FGA) is used to create new population for training the classifier with the help of three operations namely selection, crossover and mutation that helps to practice more patterns in training phase and to obtain better understanding of the proposed classifier. The simulation will illustrate that the system is competent to speed-up the training and testing process of intrusions detection is important for network applications.Please confirm if the author names are presented accurately and in the correct sequence (given name, middle name/initial, family name). Author 1 Given name: [Arun Kumar] Last name [Ramamoorthy]. Also, kindly confirm the details in the metadata are correct.Checked and Verified for Author 1. In Author 2 name, Given Name was [K.] and last name was[Karuppasamy], But its is just the opposite. Given Name is [Karuppasamy] and Last Name is [K.]. I have edited it.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The emergence of wireless networking significantly relies on the self-organized and multi-hop network environment. It aggregates huge amount of sensor nodes through wireless communication and characterized as simpler and low cost network deployment [1]. It is extensively adopted in real-time environment like military exploration, modern logistics, and environment perception where the connected sensor nodes collaboratively works to carry out detection, monitoring, and tracking of certain malicious nodes or intruders over the network [2]. Specifically, WSN-based intrusion detection system is used to handle security issues encountered during rescuing of post-disaster, region monitoring, and border patrol and turns as generic field of modern research. Thus, it needs constant monitoring and tracking method for the prediction of intrusion and thus there is a need for design to deal with these multi-objective constraints to attain high-quality and persistent handling of the intruder [3].Please confirm the section headings are correctly identified.Checked and Verified.
Some present investigations over intrusion detection is partitioned into diverse two categories: the former one is to perform trace prediction and accurate localization of the target by adoptively sensing the information from diverse nodes based on local voting and decision fusion approaches [4]. The second model relies on the movement and deployment strategies on SNs to attain enhanced dynamic target coverage. It is considered as an addition of conventional coverage optimization issues and it is the specific concern of this work [5]. The coverage quality is drastically influenced by the preliminary deployment of the SN localization. However, owing to the hostile or remote sensing environments, for example, region monitoring or border patrol based sensor deployment is not manually handled in most real-time environment [6]. Therefore, usually, the sensors are deployed with the scattering of aircrafts; moreover the appropriate position for deriving the landing is not controlled owing to the existence of obstacles and wind like mountains and trees. Subsequently, certain sub-areas does not possess appropriate sensor coverage region where diverse sensors are removed and some regions are identified with coverage issues (regions that does not comes under the coverage region) [7].
Generally, it is crucial to get rid of these issues and addition of sensors for predicting intrusions can be attained only with the adoption of miniaturized robots and embedded hardware’s. Some sensors possess similar sensing competency and considered as the static sensors and it has the ability to move towards the appropriate locations for offering optimal coverage after the node deployment [8]. Regrettably, the nodes are not competent of tracking and predicting the intruders to enhance the coverage quality. This condition is still worse with the emergence of anti-reconnaissance methods over the prediction of intruders in real-world environment. It is equipped with some sensing devices and attains location information regarding the detection nodes and carry out planning to eradicate the detection process. These intruders are depicted as an ‘empowered intruder’ and differ from the native intruders and the elegant nature of the SN’s tracking makes it stubborn. Thus, the design of effectual intrusion detection approaches for these sorts of intruders are a challenging task [9].
Conventional intrusion detection approaches for region monitoring or border patrol relies on the centralized network architecture. The intruders or the intermediate nodes transfer the information to the cluster nodes or base station and takes necessary action after information processing or analysis [10]. This method necessitates recurrent interaction among the cluster nodes, base station, and detection nodes. It is occupy huge amount of network nodes and increases the networks’ transmission delay. Thus, it outcomes delayed handling issues like interrupted events or intruder prediction [11]. Subsequently, the conventional centralized framework is inappropriate for some real-time scenario specifically over the highly-influenced intruders. The nodes have to maintain the records of the process to perform local computation, tracking of trajectories in the real-time environment [12]. Moreover, the node does not possess certain efficiency to deal with these problems.
In the modern era of computation intelligence, various approaches are non-classical approaches the works like human beings to learn certain tasks from the observations or data [13]. Subsequently, this intelligence system possesses some characteristics to make the model more feasible and to be adopted in the construction of effectual models in diverse fields. Some of its features include fault tolerance, high computational speed, competency to deal with error resilience, adaptability during the model of noisy information [14]. This research work considers Fuzzy Logic (FL) which is one among the intelligence technique that is inspired from the human brain activities with uncertainty measure. It is also considered as the logic system or rule-emergence system with appropriate features and tolerance towards uncertainty and imprecision. Thus, it performs rule-based classification in an effectual manner. Moreover, it is not self-adaptive and it acts as a candidate for optimization purpose. Here, Particle Swarm Optimization is considered which is most popular for handling the multi-objective constraints and functions as global optimization ability with Genetic Algorithm (GA). Thus, this work models a novel Fuzzy Genetic Algorithm with Multi-Objective Particle Swarm Optimization that maximizes the detection accuracy, minimizes the false alarms and takes less computational complexity. The anticipated model is tested, validated and proven with the competency or evolution of optimization model with superior accuracy and lesser FAR, improves classification accuracy for certain attacks. The features are chosen and analyzed using Principle Component Analysis (PCA). The data source is attained from the online accessible NSL-KDD dataset. The simulation takes place within the MATLAB environment, incorporating metrics such as accuracy, precision, FAR, and more.
The structure of the work is as follows: Sect. 2 comprises an in-depth survey of various existing approaches related to IDS, along with their associated pros and cons. Section 3 elaborates on the methodology in a broader sense, focusing on gaining insight into the prediction model. In Sect. 4, the discussion revolves around the results obtained from model evaluation, presented graphically. Finally, Sect. 5 presents the conclusion of the work, along with suggestions for future improvements.
2 Related Works
This section gives the recent updation regarding the data taxonomy along with certain research ideas on IDS up to data and the classification systems used for this prediction taxonomy. It offers a comprehensive and structural overview on prevailing IDS. Therefore, the research becomes proficient with certain key factors in anomaly detection.
Osanaiye et al. [15] discusses signature-based IDS for pattern matching approaches to predict the unknown attacks. Also, it is termed as misuse detection or knowledge-based detection. With this model, matching approaches are utilized to predict various intruders. Subsequently, when the intrusion signature fits with the existing intrusion signature that prevails over the signature database, then an alarm signal is found to be triggered. In case of SIDS, the host logs are identified to predict the commands sequence or actions that are previously determined as malware. It is also labelled over the reviews as misuse detection or knowledge-based detection process. Li et al. [16] discusses conventional approaches that are used for intrusion detection using network packets and pretends to match against the signature databases. However, these approaches are incapable to predict the attacks that span various packets. It is extremely essential to haul out signature information as the modern malwares are completely sophisticated over the multiple packets. It needs IDS for content recall for various packets. Generally, there are diverse methods that are used for the creation of state machines, semantic conditions, and formal language string patterns indeed of creating various IDS signatures.
Zhou et al. [17] discusses the significant benefits of various IDS to predict zero-day attacks owing to the fact that the prediction of abnormal user functionality does not based on the signature database. It induces some dangerous signals while analyzing the nature that varies from usual characteristics. Moreover, it possesses various advantages. Initially, it has the competency to predict the internal malicious functionalities. When the intruder initiates the tractions of the stolen account that are not identified by the user activities in a typical manner, it triggers the alarm condition. Next, it is extremely complex for the cyber-criminal to predict what sort of user’s characteristics is constructed devoid of any alert system form the customized profiles. Almomani et al. [18] discusses various categories of IDS methods and it is known as machine-learning based, knowledge-based, and statistics-based approaches. The last model includes examination and collection of various data records over the set of items and the construction of statistical model with normal user characteristics. Subsequently, knowledge-based model pretends to predict the essential activities from prevailing data systems like network traffic instances and protocol specifications. For instance, machine-learning approaches need complex pattern matching approach for training data.
Ioonnou et al. [19] discusses various machine learning approaches. It is a process of hauling out knowledge from huge amount of data. It is a model which is composed of set of rules, complex transfer functionality, and methods which is used to predict the essential data patterns, predict or examines the nature of the model. The learning approaches are used widely in the field of IDS. Various techniques and algorithms like NN, DT, clustering, association rules, GA and K-NN approaches are adopted for predicting or learning knowledge from intrusion datasets. Ghosal et al. [20] discusses a approach to perform feature selection using the integration of feature selection approaches like correlation attribute evaluation and Information Gain. The author validates the performance by selecting the features by applying diverse classification approaches like NB, C4.5, NB-tree and MLP respectively. Almomani et al. [21] applies genetic-fuzzy rules based mining approaches which is used for evaluating the significance of the IDS characteristics. Ke et al. [22] discusses IDS with the adoption of Random forest to enhance the prediction accuracy and Arun et al. discusses how to diminish the FAR [23]. Khraisat et al. [24] anticipates a classification approach using NSL-KDD dataset with DT algorithm to design of a model with certain metrics and examines the significance of DT approaches.
Ali et al. [25] discusses a classifier model known as Support Vector Machine (SVM) determined by partitioning the hyperplanes. It adopts kernel function to map the training data into high-dimensional space. Therefore, the intrusion is classified in a linear manner. It is well-known for its generalization ability and notably value when the number of attributes is larger and number of data points is completely smaller. Various kinds of hyper-plane separation are attained with the adoption of kernel functions like hyperbolic tangent, Gaussian radial basis function, linear and polynomial functions. With IDS dataset, some features are less influencing and redundant in data point separation into appropriate classes. Thus, feature selections are determined by SVM training. Also, SVM is adopted for classification purpose into multiple classes. Buczak et al. [26] describes SVM with RBF kernel function which is used for categorizing KDD’99 dataset in pre-defined classes. From the provided 41 attributes, the feature subset is selected in a careful manner by selecting feature selection approaches.
Peng et al. [27] depicts k-NN classifier which is a non-parametric classifier in a typical manner and applied over ML approaches. The concept behind this approach is to name the provided unlabelled data sample towards the k-NN classes. Here, ‘k’ is an integer that predicts the number of neighbours. Generally, k = 5 for most cases. Here, ‘x’ specifies he unlabelled data instances that need to be categorized. From the provided five NN, three NN possess similar patterns from the given intrusion class and two from normal class. With the major voting model, it facilitates ‘X’ for the intrusion class. Ibrahim et al. [28] anticipates a novel fuzzy-based supervised learning model by adopting unlabelled samples along with supervised learning model to improve IDS classifier performance [29, 30]. Then, the SH-FFNN model is trained for providing the output with fuzzy-based membership vector function and sample classification (high, mid and low fuzzy classifiers) over the unlabelled sample which is done with fuzzy quantifiers. The classifier is then re-trained after the integration of every category into original training set separately. The experimental outcomes use semi-supervised intrusion detection over NSL-KDD dataset and projects unlabelled samples with high and low fuzziness which leads to predominant contributions to improve the IDS prediction accuracy in contrast to conventional approaches.
This section presents a detailed review on various IDS methods, corresponding types and methodologies with significant advantages and constraints. Various machine learning approaches are used for predicting the malicious activities and intruders over sensor networks. Moreover, some of these approaches posses certain constraints during the generation and updation of data regarding the newer attacks and it provides high FAR or least accuracy. The results and methods are summarized and the contemporary models are explored based on the performance enhancements on IDS as an outcome to get rid of IDS issues.
3 Methodology
Here, a detailed discussion is done for validate the performance of proposed fuzzy genetic algorithm and MOPSO model. Some preliminary sets like data acquisition, feature selection, and classification is performed to identify the intrusion over the network. The detection framework is shown in Fig. 1.
3.1 Dataset Description
In this context, the NSL-KDD dataset is employed, where 20% of its instances serve as training data out of a total of 25,192 instances, while the remaining samples, totalling 22,544 instances, constitute the testing dataset. This dataset comprises 42 attributes, with 41 of them classified into four distinct classes.
-
1.
Basic (B) characteristics: TCP/IP connection attributes utilized in identifying delays.
-
2.
Traffic (T) characteristics: These attributes pertain to window intervals and encompass two prominent features, namely, same service and same host. The service feature evaluates the overall number of connections sharing the same services within a specific time frame.
-
3.
Host (H) characteristics: These attributes are assigned to assess attacks lasting for 2 s, scrutinizing the overall connections directed towards the destination during this duration.
-
4.
Content (C) characteristics: These attributes, informed by domain expertise, are suggested based on moment intervals.
This dataset encompasses four distinct traffic categories, each associated with 23 types of attacks, along with various features:
-
1.
Denial of Service (DoS): Attackers monopolize network resources, rendering them unavailable to legitimate users.
-
2.
User-to-Root (U2R): Attackers intercept passwords and exploit vulnerabilities on hosts to gain unauthorized access as legitimate users.
-
3.
Remote-to-Local (R2L): Attackers transmit messages from remote locations to hosts, exploiting vulnerabilities in the process.
-
4.
Probe: Attackers scan the network to gather information, leading to network breaches. Tables 1 and 2 detail the dataset’s records, labels, and attributes from the NSL-KDD dataset, while Table 3 delineates the four distinct attack categories.
3.2 Feature Selection Using Principle Component Analysis
PCA is a statistical approach which is applied in various applications like image compression, face recognition, image processing and so on. It is a common approach for predicting the patterns of high dimensional data. The complete statistical data is based on huge dataset and analyzes the relationship among the individual points (See Table 4). The objective of PCA is to diminish the data dimensionality by measuring the variations identified in the original NSL-KDD dataset. It identifies the data patterns by expressing the differences and similarities among the dataset.
Please check the edit made in caption of Algorithm 1. Please check if action taken is appropriate. Otherwise, kindly advise us on how to proceed.Yes Its perfect.
3.3 Design of Fuzzy Genetic Algorithm
A classifier model is nothing but the algorithm used for the construction of classification model from the provided dataset to categorize the data. The significance of the model is managed with various parameters like fuzzy set, fuzzy rules, and membership function and prioritization values. Generally, fuzzy logic lacks in learning ability where the optimization process is considered to be more complex. Here, the fuzzy rules, membership function, and fuzzy sets are optimized. The fuzzy rule set is specified by IF–THEN rules. The generation of rule size is based on feature size and it is managed by the dataset adopted. Moreover, to handle the classification ignorance, the numbers of rules are provided in a constraint manner. Generally, membership functions and fuzzy sets are feature-dependent. The membership function can be either trapezoidal or triangular shapes. Three fuzzy sets are considered to reduce the computational complexity. The fuzzified input mapping towards rule-base model is done with inference process to generate fuzzified output for all appropriate rules. The rule is generated based on the following Eq. (1):
Here, \(\propto R_{i}\) is \(R_{i}^{th}\) fuzzy rule set, \(^{\prime}n^{\prime}\) is number of features, \(d_{1} , \ldots ,d_{n}\) is input variables, \(\mu D_{i} \left( {d_{i} } \right)\) is fuzzified membership degree, \(\mu_{Di}\) is fuzzy set membership function. The fuzzy value (single) is allocated for all output. The final value is related with the output using maximal operator and it is expressed as in Eq. (2):
Here, \(\beta_{i}\) is maximal value for all fuzzy rules, \(\alpha_{Ri}\) is fuzzy rule strength, \(^{\prime}M^{\prime}\) are total fuzzy rules. The defuzzification process evaluates the centroid and transforms the fuzzy output to crisp values using fuzzy rules. It is expressed as in Eq. (3):
Here, \(\alpha_{Ri} * \mu D_{i} \left( {d_{i} } \right)\) is the maximal defuzzification process, \(^{\prime}n^{\prime}\) is total amount of fuzzy rules. Here, the parameters are evaluated with Genetic algorithm and it is used for categorizing the attacks where the models are used for predicting and classification of attacks. Algorithm 2 iIllustrates the genetic fuzzy algorithm
The genetic algorithm encodes (provides) fuzzy rules and the chromosomes are modelled to encode the rule-base. The fuzzy rules are specified with integer array where the size of the array is equal to the chosen feature size from the NSL-KDD dataset. The encoding process specifies the dataset features through the membership function for the chosen rule-base. The encoded chromosome fitness is evaluated with the fuzzy set, and the chromosomes. The classification accuracy is expressed as in Eq. (4) and Eq. (5):
Here, \(^{\prime}E^{\prime}\) is specified as the percentage of inappropriately categorized records. The error (classification) is specified in a quadratic manner. The roulette wheel selection process is used for selecting the appropriate parents for reproduction process. The crossover is adopted for all chromosome pairs in a random manner during reproduction. The chromosome layers are provided with fixed length under a constraint environment. Here, random mutation process is done with mutation selection probability. The best solution is attained with the adoption of elitism and helps to construct the successive generation. It involves in the substitution of the older population by transforming the of fitness candidates into the successive generation. The relationships among the chromosomes are attained with the collaboration of \(^{\prime}K^{\prime}\) rules to predict the categories of the attack. Figure 2 illustrates the flow diagram of the proposed MOPSO.
3.4 Multi-Objective Particle Swarm Optimization (MOPSO)
PSO is a bionic concept that originates from the bird’s characteristics and the preliminary concept behind it is to predict the optimal solution via the information sharing and cooperation between the individual over the group. The speed and position of the bird are considered as an independent variables and food density arrives with the functional values. The search can adjust the speed and direction based on the difference among the optimal location and population history. The entire bird swarm attains optimal location based on the population. Therefore, the findings may get optimal solution, i.e. problem convergence. The predominant benefits of PSO are:
-
1.
Stronger competency towards global search and faster computational speed.
-
2.
It is not so sensitive towards the population size with smaller effect over the training speed.
-
3.
There is no necessity towards the computation of gradient information while performing objective function optimization. It is no constraint towards connectivity, derivability, convexity, and continuity over the feasible areas of the objective function.
Multi-objective PSO intends to give solution to various domain related problems in an efficient manner. It is conceptualized as a random search problem across a D-dimensional space, aiming to optimize the objective function. Here, \(^{\prime}n^{\prime}\) particles population \(p_{i} = \left( {p_{i1} , p_{i2} , \ldots ,p_{iD} } \right)^{T}\) and \(i^{th}\) particle composed of \(d -\) dimensional position vector \(x_{i} = \left( {x_{i1} , x_{i2} ,..,x_{id} } \right)^{T}\) and velocity vector \(v_{i} = \left( {v_{i1} , v_{i2} , \ldots ,v_{id} } \right)^{T}\). For all population (particle), fitness value is attained based on the evaluation of particle fitness. The fitness function is expressed in Eq. (6):
Here, \(^{\prime}\alpha ^{\prime}\) is hyper-parameter, \(^{\prime}p^{\prime}\) shows the coordinate relationship between the classifier performance, \(N_{f}\) is the feature subset. When the search is over the \(D -\) dimensional space, then initialize the random particles and optimal solution is determined via iteration. With constant particle search, the optimal position \(p_{i} = \left( {p_{i1} , p_{i2} , \ldots ,p_{id} } \right)^{T}\) is the local optimal solution and velocity is specified as \(v_{i} = \left( {v_{i1} , v_{i2} , \ldots ,v_{id} } \right)^{T}\). The optimal position \(p_{g} = \left( {p_{g1} , p_{g2} , \ldots ,p_{gd} } \right)\) is determined as global optimal solution. For all iteration, the particle needs to update the velocity and the position by measuring the ‘optimal solutions’, i.e. \(\left( {p_{i} , p_{g} } \right).\) The updation process is expressed as in Eq. (7):
Here, \(^{\prime}N^{\prime}\) is total particles in the population with \(d -\) dimensional space, \(^{\prime}t^{\prime}\) is total present iterations, \(^{\prime}\omega ^{\prime}\) is non-negative inertia factor that manages local and global optimization capabilities. When the value is larger, the global optimization competency is stronger and local optimization competency is weaker. \(v_{id} \left( t \right)\) and \(v_{id} \left( {t + 1} \right)\) specifies the current and updates particle velocity; \(c_{1}\) and \(c_{2}\) are acceleration factors where \(c_{1} = c_{2} = 2\). \(^{\prime}r_{1} ^{\prime}\) and \(^{\prime}r_{2} ^{\prime}\) are random numbers to improve the particle randomness and eliminates the blinding search. The particles position and velocity are constrained with \(\left[ { - x_{\max } , x_{\max } } \right]\) and \(\left[ {{-}v_{\max } , v_{\max } } \right]\). The algorithm for multi-objective PSO is given in Algorithm 3:
4 Results and Analysis of Data
This section presents the numerical results and discussion of the proposed MOPSO model. The simulation is conducted within the MATLAB environment, evaluating various performance metrics. The NSL-KDD dataset is utilized for training, testing, and validation in intrusion detection. The data prediction encompasses four distinct cases: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), with their corresponding analyses provided below.
-
1.
TP: Indicates cases where both the predicted and actual labels are positive.
-
2.
FN: Denotes instances where the predicted label is negative despite the actual labels being positive.
-
3.
TN: Represents scenarios where both the predicted and actual values are negative.
-
4.
FP: Refers to situations where the predicted label is positive despite the actual label being negative.
Table 5 depicts the confusion matrix of the anticipated model. Based on the above definitions, there are some metrics like False Alarm Rate (FAR), accuracy, and Detection Rate (DR) are measured for providing a novel IDS scheme. It is discussed below:
-
1.
Detection Rate (DR): It is represented as the appropriate proportion of all positive instances, serving as a coverage measure that assesses the classifier’s predictive capability for all positive instances. This is illustrated in Eq. (9):
$$DR = \frac{TP}{{TP + FN}}$$(9) -
2.
Accuracy: It is represented as the appropriate prediction outcome relative to the total number of samples, serving as a measure to assess the overall accuracy rate of the classification samples. This is expressed in Eq. (10):
$$Accuracy = \frac{TP + TN}{{TP + TN + FP + FN}}$$(10) -
3.
False Alarm Rate: It is depicted as the predicted positive which is actually negative based on the proportional of appropriate negative. It is expressed as in Eq. (11):
$$FAR = \frac{FP}{{TN + FP}}$$(11)
Table 6 depicts the comparison of prediction accuracy and FAR of the proposed MOPSO and existing ML approaches. The accuracy of the proposed MOPSO is 98.86% which is 12.06% higher than PSO with lightweight GBM, 13.36% higher than decision tree, 15.76% higher than logistic regression, 16.8% higher than NB, 17.46% higher than multi-layer perceptron, 17.41% higher than ANN and 20.46% higher than EM clustering (See Fig. 3). Similarly, the FAR of MOPSO is 9.5 which are 1.1, 6.2, 8.9, 9, 11.6, 11.8 and 14.2 lesser than other approaches. Table 7 depicts the total training and testing time of NSL-KDD dataset w.r.t. elapse time and CPU time. The elapse time based on training is 11.52 s and CPU time is 0.30 s. The elapse time based on testing is 2.689 and CPU time is 0.035 s respectively (See Fig. 4).
Table 8 shows other metrics like precision, recall, F1-score and FAR of the proposed MOPSO respectively. The precision with normal category is 0.947%, recall is 0.995%, F1-score is 0.968 and FAR is 0.015. The values based on attack category shows 0.999% precision, 0.987% recall, 0.993% F1-scoreand FAR is 0.007. The weighted averages of all these metrics are given as 0.986%, 0.987%, 0.989% and 0.008% respectively (See Fig. 5). Table 9 depicts the precision, recall, F1-score and FAR of attack categories like DoS, probe, R2L and U2R respectively. For the DoS attack, the precision stands at 0.9940%, recall at 0.9790%, F1-score at 0.9860%, and FAR at 0.00450. In the case of the probe attack, precision is 0.8600%, recall is 0.8855%, F1-score is 0.9195%, and FAR is 0.5715. Moving to the R2L attack, precision records at 0.6920%, recall at 0.9195%, F1-score at 0.7895%, and FAR at 0.00550. Lastly, for the U2R attack, precision is 0.8880%, recall is 0.5715%, F1-score is 0.6965%, and FAR is 0.00002. The weighted averages of these metrics are 0.99%, 0.9886%, 0.9988% and 0.0996 respectively (See Fig. 6). The execution time (both training (ms) and testing (ms)) of proposed MOPSO is compared with PSO-lightweight GBM, DT, and logistic regression as in Table 10. The training time of MOPSO is 95.4565 ms which is 93.5735 ms, 5.0002 ms, 124.1083 ms lesser than other approaches. The testing duration for MOPSO is 2.5465 ms, representing a reduction of 0.505 ms, 2.3489 ms, and 9.7895 ms compared to alternative approaches (See Fig. 7)Based on these metrics, it is shown that the anticipated model works efficiently for predicting intrusion over the network with least FAR and higher prediction accuracy.
5 Conclusion
In this work a novel Fuzzy Genetic Algorithm with Multi-Objective Particle Swarm Optimization model is designed for predicting the normal traffic and evaluation time. It includes both the minor or major attack categories specifically for the rare information from the provided NSL-KDD dataset. This model includes three essential steps like feature selection, classification and optimization approaches for properly interpreting the accuracy of the given dataset to facilitate human understanding and data analysis. The proposed model is contrasted with several existing approaches. Experimental results illustrate that the proposed model effectively extracts the appropriate rule-based model from network traffic, largely benefiting from the assistance provided by MOPSO. Moreover, certain performance metrics are assessed, revealing how well the proposed model performs in meeting the objectives of the exploitation and exploration criteria, rule evolution, and detection of attack categories with superior detection rate and least FAR compared to other approaches. However, the model attains 98.86% accuracy, 9.5% FAR, 99% precision, 98.86% recall and 99.88% F1-score respectively.
The resourceful classification and detection of the primitive normal network traffic and intrusion attacks offer predominant scope in the future. Based on these models, the improved approach is applied to diverse complex problem-based domains like DNA computation. Additionally, with respect to this domain, some optimization approaches are candidate to be used to attain superior accuracy.
Data Availability
The data sets used in this article are openly available in the name NSLKDD dataset at www.unb.ca/cic/datasets/nsl.html
References
Aljawarneha, S., Aldwairiab, M., & Yassein, M. B. (2018). Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. Journal of Computer Science, 25, 152–160.
Al-Yaseen, W. L., Othman, Z. A., & Nazri, M. Z. (2017). Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system. Expert Systems with Applications, 67, 296–303.
Huang, J. Y., Liao, I. E., Chung, Y. F., & Chen, K. T. (2011). Shielding wireless sensor network using Markovian intrusion detection system with attack pattern mining. Information Sciences, 231, 32–44.
Lee, H., & Kim, E. (2015). Genetic outlier detection for a robust support vector machine. International Journal of Fuzzy Logic Intelligent Systems, 15(2), 96–101.
Osamaa, A., El-said, S. A., & Hassanien, A. E. (2016). Optimized hierarchical routing technique for wireless sensors networks. Soft Computing, 20, 4549–4564.
Li, J., Zhang, W., & Lun, L. K. (2010). A novel semi-supervised SVM based on tri-training for intrusion detection. Journal of Computers, 5(4), 638–645.
Palvinder, S. M., & Satvir, S. (2019). Improved artificial bee colony metaheuristic for energy-efficient clustering in wireless sensor networks. Artificial Intelligence Review, 51, 329–354.
Urtnasan, E., Park, J. U., Lee, S. Y., & Lee, K. J. (2017). Optimal classifier for detection of obstructive sleep apnea using a heartbeat signal. International Journal of Fuzzy Logic Intelligent Systems, 17(2), 76–81.
Borkar, G. M., Patil, L. H., Dalgade, D., et al. (2019). A novel clustering approach and adaptive SVM classifier for intrusion detection in WSN: A data mining concept. Sustainable Computing Informatics and Systems, 23, 120–135.
Huang, S. H., Chen, W. Z., & Li, J. (2017). Network intrusion detection based on extreme learning machine and principal component analysis. Journal of Jilin University, 35(5), 576–583.
Liang, W., Tang, M., Long, J., Peng, X., Xu, J., & Li, K.-C. (2019). A secure fabric blockchain-based data transmission technique for industrial internet-of-things. IEEE Transactions Industrial Informatics, 15(6), 3582–3592.
Shone, N., Ngoc, T. N., Phai, V. D., et al. (2018). A deep learning approach to network intrusion detection. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(1), 41–50.
Yin, C., Zhu, Y., Fei, J., et al. (2017). A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access, 5(2), 21954–21961.
Wang, C. R., Xu, R. F., Lee, S. J., et al. (2018). Network intrusion detection using equality constrained-optimization-based extreme learning machines. Knowledge Based Systems, 147, 68–80.
Osanaiye, Alfa, A. S., & Hancke, G. P. (2018). Denial of service defence for resource availability in wireless sensor networks. IEEE Access, 6, 6975–7004.
Li, P., Zhao, W., Liu, Liu, X. and Yu, L. (2018). Poisoning machine learning based wireless IDSs via stealing learning model. In Proceedings of International Conference on Wireless Algorithms, Systems and Applications, pp. 261–273.
Zhou, Y., Liu, Y. Wang., & Tian, Z. (2019). Anonymous crowdsourcing-based WLAN indoor localization. Digital Communications and Networks, 5(4), 226–236.
Almomani, Al-Kasasbeh, B., & Al-Akhras, M. (2016). WSN-DS: A dataset for intrusion detection systems in wireless sensor networks. Journal of Sensors, 2016, 1–16.
Ioannou, Vassiliou, V. and Sergiou, C. (2017). An intrusion detection system for wireless sensor networks. In Proceedings of 24th International Conference on Telecommunications (ICT), pp. 1–5.
Ghosal, & Halder, S. (2017). A survey on energy efficient intrusion detection in wireless sensor networks. Journal of Ambient Intelligence and Smart Environments, 9(2), 239–261.
Almomani, & Alenezi, M. (2018). Efficient denial of service attacks detection in wireless sensor networks. Journal of Information Science and Engineering, 34(4), 977–1000.
Ke, Meng, Q., Finley, T.,Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of Advances in Neural Information Processing Systems, pp. 3146–3154.
Arun Kumar, R., & Karuppasamy, K. (2022). Integration of fuzzy with incremental import vector machine for intrusion detection. International Journal of Computers Communications & Control, 17(3), 4481.
Khraisat, A., Gondal, I., Vamplew, P., & Kamruzzaman, J. (2019). Survey of intrusion detection systems: Techniques, datasets and challenges. Cyber-Security, 2(1), 20.
Ali, Al Mohammed, B. A. D., Ismail, A., & Zolkipli, M. F. (2018). A new intrusion detection system based on fast learning network and particle swarm optimization. IEEE Access, 6, 20255–20261.
Buczak, & Guven, E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys and Tutorials, 18(2), 1153–1176.
Peng, Leung, V. C. M., & Huang, Q. (2018). Clustering approach based on mini batch kmeans for intrusion detection system over big data. IEEE Access, 6, 11897–11906.
Ibrahim, Basheer, D. T., & Mahmod, M. S. (2013). A comparison study for intrusion database (Kdd99, Nsl-Kdd) based on self-organization map (SOM) artificial neural network. Journal of Engineering Science and Technology, 8(1), 107–119.
Divekar, Parekh, M., Savla, V., Mishra, R. and Shirole, M. (2018). Benchmarking datasets for anomaly-based network intrusion detection: KDD CUP 99 alternatives. In Proceedings of IEEE 3rd International Conference on Computing, Communications and Cyber-Security. (ICCCS), pp. 1–8.
Liu, Y., Fu, J.-S., & Zhang, Z. (2016). K-nearest neighbors tracking in wireless sensor networks with coverage holes. Personal and Ubiquitous Computing, 20(3), 431–446.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Contributions
Both the authors Arun Kumar Ramamoorthy and K.Karuppasamy contributed to the study conception and design. Material preparation, data collection and analysis.
Corresponding author
Ethics declarations
Conflict of interest
Dr.Arun Kumar Ramamoorthy declares that he has no conflict of interest. Dr. K.Karuppasamy declares that he has no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ramamoorthy, A.K., Karuppasamy, K. Unified Intrusion Detection Framework: Predictive Analysis of Intrusions in Sensor Networks. Wireless Pers Commun 137, 1559–1580 (2024). https://doi.org/10.1007/s11277-024-11396-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-024-11396-6