1 Introduction

According to the World Health Organization, interpersonal violence is a leading cause of impaired quality of life and mortality in the world, especially among people between 15 and 44 years [1]. Interpersonal abuse and violence is a pattern of behavior used to establish power and control over another person through fear and intimidation, often including the threat or use of violence [2]. It can take many forms such as verbal or emotional, physical, sexual, and digital abuse; stalking (online and in-person); and economic abuse which can put public security at risk and cause detrimental effects on the personality and esteem of the individual. Physical assaults, stalking, and verbal insults occur in real time but are often unreported or suffer delayed response. A person can be abused physically, sexually, or verbally in-person in a real-time environment. It is imperative to record and detect such acts and their severity. Internet of things (IoT) is one of the most disruptive technologies to surface in modern history and can potentially assist in apprehending instances of real-time abuse (RTA). The IoT devices such as smart wearable devices have emerged as a powerful monitoring system for detecting psychological state changes that might be triggered due to physical harm or fighting [3,4,5]. Also, surveillance systems can be used to tap vaping, substance abuse, real-time bullying, molestation, and cyber-stalking incidents. For example, sensors that capture elevated sound levels can facilitate detection of fighting or bullying. At the same time, technology means that abuse is no longer limited to schoolyards, street corners, or public places. Cyberspace has been recognized as a conducive environment for use of various hostile, direct, and indirect behavioral tactics to target individuals or groups [6,7,8]. An alarming trend in the abuse of everyday technology has been recently observed where perpetrators misuse networked devices and software to control, isolate, humiliate, and dominate their victims. Collectively referred to as technological abuse or technology-facilitated abuse, it can range from online harassment, stolen online identities, hacking, spoofing, and revenge pornography to stalking and surveillance.

With the vast benefits that the connected world brings to the consumers, it is also inviting attackers to continuously identify new exploits and hit techniques designed to circumvent the security around the IoT networks. IoT is connected to the worldwide Internet that exposes it to global intrusion in addition to wireless attacks inside an IoT network. That is, though the networked devices provide many advantages, they also offer abusers an abundance of opportunities to control, harass, and stalk their victims. Unfortunately, IoT-based surveillance and real-time data can itself create an abusive environment or cyber-stalking incident. The IoT devices collect huge amount of data at the granular level including account details with shared passwords, person’s behavior and preferences, movements by GPS, and audio-video recordings. The ubiquitous sensing enabled through these IoT devices exacerbates the forms of abuse. For example, a set of cameras that were being accessed from the same remote location dozens of times can raise an anomaly as it may turn out that an employee of a service provider is improperly using the cameras to watch people in their homes. Users can take measures such as changing network settings or replacing the risked smart devices. But, such tactics will turn futile if a perpetrator uses stalkerware (software designed to monitor messages on a device, record screen activity, track its location, and give access to its cameras) or has access to an Internet router and realizes a password change.

Network forensics [9] and penetration testing (ethical hacking) [10] are primary methods which can be used to identify security vulnerabilities. The generic term “forensic” involves the use of scientific methods and techniques to investigate a crime. This term can be aptly adapted and applied in the same way within the context of networks. Referred to as network forensics, it involves the use of scientific methods and techniques to capture, store, and analyze network events and attacks. Techniques include detection, which implies the ability to detect the presence of an attack as early as possible, and prediction, which means deriving the likelihood of future attacks from current data. Figure 1 depicts primary tasks for improving security systems and preventing attacks. But, the current cyber-security solutions leave a wide gap in coverage as the variety and volume of data involved in identifying and predicting security threats are overwhelming.

Fig. 1
figure 1

Tasks for improving security systems and preventing attacks

There are a plethora of tools and solutions available to detect attacks and block cyber-attacks such as firewalls, spam filters, and anti-malware to protect endpoints across environments (home or organizations), regardless of size or industry. However, another highly valuable security tool that is indispensable to ensure network security is the Intrusion Detection System (IDS). IDS is a primary mechanism for cyber-security as data collected via networked devices can be analyzed for detecting illegal access and threats. Typically, an IDS is an application which deals with threats from the Internet or the Intranet [11, 12]. It is further categorized as signature-based IDS and anomaly-based IDS [13]. The signature-based approach include traditional techniques to fight cyber-attacks by gathering data about malware, data breaches, phishing campaigns, etc., and extracting relevant data into signatures, that is the digital fingerprint of the attack. These signatures are then compared against files and network traffic that flows in and out of a network in order to detect potential threats. While signature-based solutions continue to remain a prevalent form of protection, they do not suffice to deal with the advanced and increasingly sophisticated attacks. Since most attacks do not occur in the predefined pattern lists, signature-based techniques cannot protect the system against unknown attacks. Also as the types and frequency of attacks are growing continuously, it becomes time-consuming and impracticable to keep the database updated. In an anomaly-based IDS, the system traffic is tracked and correlated to the system’s usual activities and usage. Some variation from the normal pattern of use is regarded as an interference. The anomaly detection method can identify novel attacks which have not been previously defined whereas signature-based detection can fail under such circumstances.

An IDS is a security mechanism that works mainly in the network layer of an IoT system. It is deployed for an IoT system which should be capable to analyze data packets and generate real-time responses, analyze data packets in different layers of the IoT network with different protocol stacks, and adapt to different technologies in the IoT environment [14]. Also, an IDS that is designed for IoT-based smart environments should operate under constrained conditions in terms of low processing capability, limited storage, battery, fast response, and high-volume data processing. The basic IDS categories for IoT are based on the detection, placement, and validation strategies adopted as shown in Fig. 2.

Fig. 2
figure 2

IDS categories for IoT

Predictive techniques for mitigating security issues can be used to defend against attacks, interference, and unauthorized access to information and computers. Recently, soft computing techniques have emerged with proven capabilities to analyze IoT device usage and behavior across very large deployments. A human observer would never be able to correlate the activities that signal abusive behavior. Intelligent machine learning techniques can facilitate monitoring access logs to a million security cameras and detect anomalies that might indicate abuse. This process is based on a software “agent,” that is the IDS. IDS usually manages large magnitudes of data traffic and is challenged due to the dynamic, extensive, instant, and noisy data. Common issues include obtaining sufficient samples, redundant and inappropriate features, noise removal fallacy, and evaluation dilemma. In order to reduce the processing time and increase the IDS’s performance, only the most relevant features are selected. Feature selection is an imperative step in any machine learning task, wherein a subset of most relevant features is selected from the entire feature set. It is the process of selecting an optimal subset of features with the aim of maximizing or minimizing an objective function. Previous studies confirm that selection of features allows narrowing down a subset of features, or attributes, to be used in the predictive modeling process, thereby reducing the computational cost of modeling and, in some cases, improving the performance of the model, too [15, 16]. From a taxonomic point of view, feature selection methods usually fall into one of the following four categories, namely filter, wrapper, embedded, and hybrid classes [17]. Feature selection techniques are advantageous as they can counter the curse of dimensionality, reduce the overall training time, curb overfitting, and increase model generalizability. Basically, the accuracy and generalization power can be leveraged by choosing a correct feature selection technique [18, 19]. But, selecting the important features without much loss of total information is a computationally extensive problem which manifolds when the huge percentage of data is unstructured and high-dimensional as the real-time network traffic big data. It is imperative to choose a feature selection technique to get insights about the features and their relative importance with the target variable.

Feature selection techniques are intended to reduce the number of input variables to those that are believed to be most useful to a model in order to predict the target variable. The feature selection techniques can further be divided into unsupervised and supervised techniques. The supervised feature selection techniques are further classified as filters, wrappers, and intrinsic techniques. The filter methods are based on the characteristic properties of the features such as relevance of the features which is measured via univariate statistics. In contrast to the filter methods, wrapper methods measure the usefulness of features based on the classifier performance. Using various swarm-based wrapper methods can eliminate the curse of dimensionality by removing unnecessary and improper features in the data. This research proffers the use of multiple filter methods, namely information-based (information gain) [20, 21], divergence-based (Relief-F) [22], and dependency-based (chi-square) [23] hybridized with swarm-based ant colony optimization (ACO) [24] wrapper methods to maximize the relevance and minimize the redundancy in feature set. Finally, the optimized feature set is used to train an ensemble learning model (bagging) and make the final predictions. Bagging reduces the variance while retaining the bias. The proposed anomaly detection methodology MFEW_Bagging classifies use pattern into normal and abnormal categories. The proposed methodology is evaluated on publicly available real-time NSL-KDD IDS dataset [25].

The organization of paper is as follows: Section 2 characterizes the primary approaches to mitigate IoT-based real-time abuse followed by a brief overview of related work in Section 3. Section 4 discusses the proposed model, and Section 5 presents the results and discussion. Conclusion and future work is given in the last section (Section 6).

2 Mitigating IoT-based RTA

IoT devices have dwelled expansively in our routine lives. Its pervasiveness as well as the intrusive data collection and sharing features transfigure these into digital weapons that can be used to harm, intimidate, and abuse people (children, adolescents, women, transgender) at various locations and in varied situational context. Moreover, the diverse data types and computing power among IoT devices mean there is no “one size fits all” cyber-security solution that can protect any IoT deployment. Therefore, it is imperative to outline approaches that help mitigate the IoT-based RTA in a pro-active or reactive manner. We categorize these approaches into four key types, namely overlooking the problem, prevention of IoT-based RTA, avoidance of IoT-based RTA, and finally, detection of IoT-based RTA. The categories proposed are homogeneous to the concept of deadlocks in operating systems. Prevention and avoidance are pro-active approaches whereas the detection is a reactive approach. With the recent upsurge in the use of learning-based techniques, we also look into a pro-active predictive category of mitigating IoT-based RTA. The following subsections illustrate these approaches.

2.1 Overlooking the problem

IoT implies that adequate devices are operational in a particular environment with dynamic communication. The IoT ecosystem enables information flows over the Internet with wireless accumulation and exchange of data. The growing numbers of IoT devices undoubtedly expand the capabilities of the environment but at the cost of a wider attack surface. Unfortunately, most users are unaware of the threats and vulnerabilities that may exist. At the same time, a majority of abuse in real time is ignored, owing to the non-willingness of victims to report such incidents. Ignorance of future risks and procrastination over taking action are never solutions, and providers need to mitigate cyber-security risk and build trust in the power of the IoT.

2.2 Prevention of IoT-based RTA

Preventing an abuse implies a situation when IoT-based abuse is bound to happen, but using some logic, we are preventing that abuse. Success depends on ensuring the integrity and confidentiality of IoT solutions and data while mitigating cyber-security risks. The following techniques can be used as preventive measures to mitigate the risks of IoT-based RTA:

  • Changing passwords/passcodes for each account and device, including the Wi-Fi.

  • Turning off GPS/location services/Bluetooth unless necessary.

  • Preferable usage of a safe (“clean”) device and a new account (email) which the abuser cannot access, for all safety planning.

  • Remaining skeptical of suspicious messages, friend requests, emails, or attempts to collect user info from unknown third parties.

  • Be careful with what you are posting, because it might give away information that would qualify as “social leakage.”

  • Keep all security apps/software updated.

2.3 Avoidance of IoT-based RTA

Avoidance refers to completely ruling out any chances of abuse. It is essential to stay ahead of the curve in order to avoid the detrimental consequences of compromised networks and faulty technology. This would mean considering some mandatory security guidelines as follows:

  • Data accountability: All data being collected and stored within an IoT system should be accounted for.

  • Security settings: All connected devices within a network should be configured with security in mind which includes setting strong username and passwords, multi-factor authentication, and encryption.

  • Device physical security: It is important to physically safeguard IoT device against tampering. It should be kept in a restricted place or secured with the appropriate locks or other tools.

  • Life cycle approach for IoT security: It is essential to adopt an end-to-end, comprehensive, policy-based architectural approach to address all the relevant security themes including network/application/hardware security, standards, detection and reaction, governance, and maintenance throughout the life cycle of an IoT object, that is from its manufacturing to disposal.

2.4 Detection of IoT-based RTA

Detection is essentially a reactive mechanism, where an abusive incident should be identified and reported promptly. Inherently, IoT devices have low computing power, custom architectures, and little memory and storage whereas the standard security solutions require some performance, are often hard to port to custom architectures, and require much memory and storage for database. Detection approaches look for identifying anomalies by mining insights or information in a data pool. Machine learning (ML) can offer a viable solution to compensate for this differential in settings as it looks for patterns in given data, is easy to port to new and unknown architectures, and requires little computing power, memory, and storage. Pertinent studies report the use of artificial intelligence (AI) to detect anomalies by real-time modeling of network traffic, log and audit files, net nodes, servers, and all “smart IoT” devices. ML-based solutions can mitigate risks of new malware that can no defined “signature” (0-day attacks) and, at the same time, can counter the advanced persistent threats (APTs) where adaptive learning algorithms can detect the step-by-step penetration of apt malware (phishing, Trojans, adware, botnets, etc.).

Simultaneously, ML-based predictive analytics can be used to proactively detect and analyze threats, providing actionable insights to security analysts for making informed decisions with speed and accuracy. This research reports one such methodology where ML capabilities trained with optimal feature set are used to identify anomalies in real time such that observations detected and classified in the past can help to classify future data points. Prediction reduces the amount of time a security analyst may take to make the critical decisions and launch a systematic response to resolve the threat. Figure 3 depicts the effect of proactive vs. predictive vs. reactive mechanisms in mitigating IoT-based abuse.

Fig. 3
figure 3

Proactive vs. predictive vs. reactive mechanisms

3 Related work

Most of the existing literature discusses about the design of self-security devices, alarm systems, and SOS devices such as wearable, RFID tags, buttons, and GPS-GSM-enabled trackers, which would be used as reactive safety mechanisms during an abusive incident. Few studies also focus on how IoT devices can facilitate detection of abusive incidents and provide a pro-active mechanism. But to the best of our knowledge, none of the studies discusses solution-based approaches for IoT-based real-time abuse, that is how IoT can be misused for abuse. Simultaneously, most of the studies have focused on using IDS data for analyzing potential attacks. IDS has been a popular field of research for many years, and several systems for intrusion detection have been mentioned in the literature. Different researchers have reported the use of machine learning algorithms on various anomaly-based publicly available IDS datasets and evaluated performance [26,27,28]. In 2020, Chkirbene et al. [29] proposed a trust-based intrusion detection and classification system that limited input features’ size based on a novel feature selection method. A model of network IDS based on convolutional neural network IDS was suggested by Xiao et al. [30] in 2019. In the same year, Kasongo et al. [31] proposed a deep learning–based IDS using neural networks combined with a feature selection algorithm based on filtering. A variety of filter and wrapper methods in anomaly detection systems have also been used in the literature. In 2016, Osanaiye et al. [32] had put forward an ensemble-based multi-filter feature selection (EMFFS) method that combines the output of four filters, namely information gain (IG), gain ratio, chi-square, and ReliefF to select important features. In 2020, Zhou et al. [33] proposed a heuristic algorithm with a voting classifier for intrusion detection and achieved an accuracy of 99.8%.

4 MFEW_Bagging: hybrid feature selection with ensemble learning to detect and predict anomalies for IoT-based RTA

One of the desirable characteristics of a ML model is that it should exhibit low variance, that is it should not overfit the training data and lose the generalization capabilities to unseen data. A key method is to minimize the number of features used to train the model. Feature selection techniques enable selecting a near-optimal set of input variables that would minimize variance and maximize generalizability of the model. These techniques optimize the model performance, reduce the training time, as well as make debugging and explainability easier with fewer features. Also, a single feature selection method may produce an optimal or sub-optimal local subset of features for which efficiency might be compromised. An ensemble feature selection approach combines multiple feature subsets to select an appropriate subset of features using a feature ranking combination that increases the classification accuracy. This paper puts forward a predictive analytic approach to understand its capabilities for IoT-based abuse mitigation. A multiple filter ensemble with wrapper-based feature selection is used to generate an optimal feature set which is used to train an ensemble learning Bagging classifier to output the class categories as normal or anomaly. Figure 4 depicts the proposed MFEW_Bagging methodology.

Fig. 4
figure 4

Proposed MFEW_Bagging methodology

4.1 Filter methods

Filter methods are used for selecting the most significant features from the given feature set. The filter methods are based on the characteristic properties of the features such as relevance of the features which is measured via univariate statistics. A number of filter methods are available in the literature, broadly based on measures, like information (or uncertainty), distance, and dependence (or probability). In this work, we use the information-based information gain method, divergence-based ReliefF, and dependency-based chi-square method to typify a multi-filter which harnesses their combined strength and alleviate biasness on selected features.

4.1.1 Information gain

It is a method for calculating the relevancy of a particular feature for the determination of the class label. It measures the information gained in predicting a class value when a particular feature is present or absent. It is based on the concept of entropy and can be defined as “a measure of the reduction in entropy of the class variable after the value for the feature is observed.” It can be calculated as given in (1).

$$ \mathrm{IG}(t)=-\sum \limits_{i=1}^mp(ci)\log p(ci)+p(t)\sum \limits_{i=1}^mp\left( ci|t\right)\log p\left( ci|t\right)+p\left({t}^{\prime}\right)\sum \limits_{i=1}^mp\left( ci|{t}^{\prime\prime}\right)\log p\left( ci|{t}^{\prime\prime}\right) $$
(1)

where ci indicates the ith class; p(ci) indicates the probability of the ith class; p(t) and p(t′) are the probabilities of the presence and absence of the feature t, respectively; and p(ci|t) and p(ci|t′) are the conditional probabilities given the presence and absence of the feature t, respectively.

4.1.2 Chi-square

The chi-square (CS) test usually refers to Pearson’s chi-square and is also known as the chi-square goodness-of-fit test or the chi-square test for independence. It is used when we have two categorical variables and want to determine whether there is a significant association between the two variables. It measures the dependence between stochastic variables, so using this function “weeds out” the features that are the most likely to be independent of class and therefore irrelevant for classification. It is calculated as given in (2).

$$ {\chi}^2=\sum \frac{{\left({O}_i-{E}_i\right)}^2}{E_i} $$
(2)

where Oi is the observed value and Ei is the expected value.

4.1.3 Relief-F algorithm (Relief)

It is an instance-based, heuristic method; it works out weight values for each feature, based on how important they seem to be in discriminating between near neighbors. Algorithm 1 describes the working of the basic Relief filter method.

figure a

Relief-F evolved from the original Relief algorithm and was developed to improve its limitations. Kononenko [22] proposed a number of updates to Relief. Notably, the “F” in ReliefF refers to the sixth algorithm variation (from A to F) proposed. Firstly, ReliefF relies on a “number of neighbors” user parameter k that specifies the use of k-nearest hits and k-nearest misses in the scoring update for each target instance (rather than a single hit and miss). This change increased weight estimate reliability, particularly in noisy problems. Secondly, three different strategies were proposed to handle incomplete data (i.e., missing data values). These strategies were proposed under the names Relief (B–D). Thirdly, two different strategies were proposed to handle multi-class endpoints. These strategies were proposed under the names ReliefE and ReliefF. ReliefF, which inherited the changes proposed in ReliefA and ReliefD, was selected as the best approach. During scoring in multi-class problems, ReliefF finds k-nearest misses from each “other” class and averages the weight update based on the prior probability of each class (Algorithm 2).

figure b

Next, a multiple filter feature selection (MFFS) technique is used to create a new search space which combines the best of all the three filter methods. That is, for the given feature set in the dataset, MFFS ranks and sorts the features according to the corresponding filter method. It then takes the top N features from each of the three filter rankings (R1, R2, and R3, respectively) and uses a union of set operation to include the best features from both the filter rankings, thus generating a selected feature set, S. Algorithm 3 depicts the working of the ensemble MFFS technique.

figure c

This feature set (S) may still be large, owing to the real-time dynamic data that is generated in large volumes and with high velocity, such as network traffic data. Therefore, the use of wrapper method is justifiable to find the most useful features.

4.2 Wrapper methods

In contrast to the filter methods, wrapper methods measure the usefulness of features based on the classifier performance. Given the large number of attributes, it is imperative to select the relevant few to shorten training time, enhance generalizability of the model by avoiding overfitting, get simplified models, and avoid the curse of dimensionality. Swarm algorithms are a class of population-based meta-heuristics which arrive at an optimum solution using a set of collective, decentralized, distributed, and self-organizing agents. The most prominent among the swarm-based algorithms are those inspired by the behavior of species in nature like birds, ants, and insects. In this paper, we use a swarm-based ACO wrapper algorithm as the search method for finding an optimal feature subset (F).

4.2.1 Ant colony optimization

Given by Dorigo [24] in 1992, it is inspired by the communication process used by ants. Ants, when searching for food, start off randomly in a direction. On finding food, the ant returns to its colony leaving pheromone (a chemical) trails on the way back. The pheromone is made stronger if other ants follow the path and find food as well. On the other hand, the trail becomes fainter as it evaporates over time if the path is not traveled by other ants. The pseudo-code for ACO is given in Fig. 5.

Fig. 5
figure 5

Ant colony optimization

The algorithm for ACO is given in Algorithm 4.

figure d

4.3 Ensemble classifier

Finally, ensemble learning is used to classify the intrusion. The ensemble classifiers combine the classification results from different classifiers to produce the final output. In this work, we use bagging. Bagging [34] refers to bootstrap aggregating which is a way to increase accuracy by decreasing variance. In bagging, each model in the ensemble votes with equal weight and trains each model with a random training set. It is done by generating additional dataset using combinations with repetitions to produce multi-sets of the same cardinality/size as original dataset.

5 Results and discussion

The performance of the proposed work through experiments is evaluated in this section. In this research work, we use the NSL-KDD IDS dataset [25]. This IDS classification was implemented on a 2.7-GHz Intel Core i5 with 16-GB RAM. NSL-KDD is an improved variant of the KDDCup 99 data collection that does not have redundant tests, avoiding a biased outcome for classifiers. It includes 41 features with class label attributes. The dataset has 41 features per record which can be further categorized in 4 types as shown in Fig. 6.

Fig. 6
figure 6

Feature categories of the NSL-KDD IDS dataset

Table 1 enlists the features in the NSL-KDD dataset.

Table 1 Feature set of the NSL-KDD IDS dataset

The metrics used to estimate the performance of the proposed work are given in Table 2.

Table 2 Evaluation metrics

In Table 2, true positive (TP) rate implies correctly classified anomalous instances as an anomaly, true negative (TN) rate implies correctly classified normal instances as normal, false negative (FN) rate implies wrongly classified anomalous instances as normal, and false positive (FP) rate implies wrongly classified normal instances as an anomaly. For a good classifier to detect attacks, it should have high DR and low false alarm rate (FAR).

The top N ranked features given by each filter were considered. The value of N was set to 14 as it divided the feature set into a 1/3 split. A union of set operation was then performed to generate a multiple filter feature set (S) with 19 features. These 19 features were input to the wrapper to output the most relevant and useful features, finally to generate the optimal feature set (F) which was used to train the ensemble classifier. The details of features selected using the individual filter methods, its ensemble, and subsequent wrapper are shown in Table 3.

Table 3 Features selected

The final selected 10 features using the multiple filter ensemble and the wrapper were as follows: f3-Service, f4-Flag, f5-Src_bytes, f6-Dst_bytes, f12-Logged_in, f23-Count, f26-Srv_serror_rate, f29-Same_srv_rate, f30-Diff_srv_rate, and f39-Dst_host_srv_serror_rate.

The performance results were evaluated for the proposed MFEW_Bagging using accuracy, detection rate, and false alarm rate. The performance of individual filters, permutations of filter with wrapper, and wrapper was also evaluated. The proposed methodology gave the highest accuracy of 99.86% with a FAR of 0.002. The comparative performance results are shown in Table 4.

Table 4 Performance results

To evaluate the effectiveness of ensemble learning techniques, a comparison with the boosting technique was also done. Table 5 depicts the results of the same. It was observed that the bagging classifier performed superlative in comparison to the boosting classifier. Figure 7 depicts the accuracy comparison for both multiple filter with wrapper and multiple filter without wrapper feature selection with bagging and boosting.

Table 5 Ensemble techniques comparison
Fig. 7
figure 7

Accuracy comparison of ensemble classifiers

The primary objective of this research was to comprehend and characterize the IoT-facilitated real-time abuse and not improving over state-of-the-art (SOTA) anomaly detection techniques. But to better understand how ML-based predictive analytics helps to proactively detect and analyze threats, we compared the results to the recent SOTA ensemble model [33] which uses Correlation-based feature selection with bat algorithm. The results of the proposed methodology were comparable to the SOTA technique as shown in Fig. 8.

Fig. 8
figure 8

Performance comparison with SOTA

6 Conclusion

With the advancements in technologies over time, the attackers have also come up with novel and potent ways of exploiting our devices and invading our privacy. Using IoT devices as a mode of abuse is an emerging technology challenge which provides new opportunities for abusers to control, harass, and stalk their victims. This paper fostered the need to develop mitigation approaches to prevent, avoid, detect, and predict IoT-based real-time abuse. A set of approaches was put forward, and finally, a prediction model for detecting abnormal use patterns was proffered. The proposed MFEW_Bagging methodology used a multiple filter ensemble with a swarm-based wrapper to reduce the feature set and finally train a bagging classifier. The results were comparable to the existing works with an accuracy of 99.8% on the benchmark NSL-KDD dataset with 10 features selected out of the original (41). Thus, this research recognizes that with the increase in diverse data types and computing power among IoT devices, there is no “one size fits all” cyber-security solution that can protect any IoT deployment though various types of cyber risks. Therefore, with the growing IoT complexity, understanding the risks in a proactive predictive manner is the best way to better defend your networks and systems. As a potential direction of future work, we would like to test other filters and wrappers for feature set reduction. The robustness of the methodology also needs to be evaluated using various available benchmark datasets. As the IoT is resource constrained with power and memory limitations, the energy consumption, processing time, and performance overhead of an IDS are important performance metrics. Thus, robust and lightweight IDS designs for IoT-based smart environments which consider all these factors is the need of the hour.