1 Introduction

Wireless Mesh Network (WMN) has emerged as a new technology to cope with the challenges of future mobile communications systems [1], such as providing flexible, adaptive, and reconfigurable network architecture while offering inexpensive solutions to service providers. Service providers are choosing WMNs to provide wider service diversity to costumers, for instance, multimedia and data services that require low latency and high data rates, as WMNs allow fast, robust, and resource-efficient network deployment. Wireless Community Networks and Municipal Wireless Networks are good examples of real-life WMNs offering rich radio multi-hop connectivity to the mesh clients, e.g., mobile devices and computer terminals. This type of network architecture provides low-cost Internet access via Wi-Fi technology to whole communities by using inexpensive IEEE 802.11 or 3G/4G wireless routers to deploy the WMN depending on the high-speed network available. In that case, the operator of the network acts as a wireless service provider.

WMNs are typically self-organized and self-configured networks [2] whereas the mesh nodes automatically establish a multi-hop ad-hoc network and maintain the connectivity among the participating nodes. In WMNs, the nodes need to trust each other since a node depends on intermediate nodes to reach other nodes and eventually connect to the Internet. Consequently, the routing mechanism of WMN assumes that each participating node provides accurate routing information and cooperatively forwards packets for its neighbors as specified by the routing protocol with no malicious intention. By taking advantage of this assumed trust, a malicious node can easily adulterate the routing paths of the network by disseminating incorrect routing information through the network. Insider attacks are executed by legitimate mesh nodes, i.e., mesh routers and mesh clients, which are compromised by the attacker. These attacks target the routing protocol, e.g., node impersonation and routing misbehavior threats, and are particularly a serious issue for the correct operation of the network. They not only compromise the integrity of the routes but also the availability of the entire WMN including the mesh routers.

Many research efforts have been conducted to conceive security methods for the existing routing protocols of WMNs. In this case, many secure routing protocols have been designed such as ARAN [3], Ariadne [4], SAODV [5], and more recently the secure routing protocol approaches [68]. Most of these approaches rely on key management and encryption technologies to protect the integrity and authenticity of the routing messages as well as prevent unauthorized nodes from joining the network. Outsider attacks, which are performed by malicious wireless cards deployed by the attacker, try to eavesdrop or interfere with the transmissions. These attacks can be avoided by applying cryptographic techniques, such as authentication mechanisms, in which the transmitted messages are protected against external attacks, such as spoofing attacks and passive attacks. Nevertheless, a compromised device owning a legitimate key, or even a non-authenticated mobile node that has access to the network, can generate malicious traffic, intercept and modify packets, and refuse to forward traffic by discarding packets received from the neighboring nodes. Therefore, a Collaborative Intrusion Detection System (CIDS) is essential for WMNs to identify malicious traffic and misbehaving nodes, to secure the routing mechanism, and to cope with the limitations of cryptographic systems acting as a second line of defense.

The main contribution of this work is the design and implementation of a complete distributed and CIDS solution for WMNs. Our CIDS approach detects efficiently insider intrusions in real-time by collaboratively monitoring the network activity and exchanging suspected traffic information, then being able to identify routing attacks in cooperation with the neighboring nodes. To that end, a customizable and flexible Routing Protocol Analyzer (RPA) is proposed for the CIDS in which the collected routing traffic is analyzed and Routing Events are generated to represent the routing protocol behavior. These Routing Events should be processed with the help of a set of protocol constraints/rules, that specify the correct behavior of the protocol, in order to find evidence of malicious routing behavior. To achieve this, a Distributed Intrusion Detection Engine (DIDE) is developed which treats the Routing Events by using Routing Constraints and calculate respective Misbehaving Metrics that indicates the presence of routing misbehavior intrusions on the node interface or in the neighborhood.

Collaboration among neighboring nodes is achieved by a Cooperative Consensus Mechanism (CCM) that implements a consensus procedure which inspects the Misbehaving Metrics of neighbors applying a proposed threshold scheme, and makes a final decision on the malicious routing behavior and the suspicious node that originated the attack. We also propose a Response Mechanism to trigger active and passive remediation actions according to the type of attack detected.

Finally, we demonstrate the efficacy of our distributed CIDS approach on a Virtualized Mesh Network environment to determine if our approach achieves high detection rate and low false positive rate, i.e., false alarms, and how much of computational and communication resources is consumed by the CIDS at the node.

The remainder of this paper is organized as follows. In Section 2, we discuss intrusion detection in WMNs. Section 3 describes the proposed distributed and CIDS architecture. Section 4 presents the evaluation of the approach and obtained results and analyzes the performance of the intrusion detection solution in terms of resources consumption. In Section 5, we discuss related work regarding CIDS in WMNs, and compare our work with other relevant works. Finally, Section 6 concludes the paper and points toward future research directions.

2 Background

In this section, we discuss intrusion detection basics and IDSs approaches in WMNs.

2.1 Intrusion detection

IDS is a device or software application that collects data by monitoring the different system activities, and subsequently analyzes the gathered data to detect possible unauthorized activities, such as malicious activities or security policy violations. IDSs can be classified into two categories according to the monitored events: (i) host-based IDS (HIDS), which monitors and analyzes internal events of the computing system such as system calls, application logs, and file system activities; and (ii) network-based IDS (NIDS) that monitors the network traffic, where the network interface of the system is put into promiscuous mode so all packets are captured for subsequent analysis.

IDSs are also classified depending on the detection engine into three main categories as follows. Signature-based or misuse-based IDS compares the collected audit data with a pre-defined set of attack patterns, known as signatures/rules, to identify intrusions. If an intrusion pattern matches, then that attack is detected. The advantage of signature-based detection is the high accuracy. For that reason, signature-based IDSs are widely preferred for commercial use. One limitation of this detection technique is it can only detect known attacks, not being able to detect new attacks or even variants of known attacks. To circumvent that drawback, Marín-Blázquez and Martínez Pérez [9] proposed the use of a XCS variant for the misuse detection engine by applying hedged linguistic fuzzy classifiers, then adding human interpretation when detecting the intrusions. Little research on signature-based IDS approaches for WMNs has been proposed since the constant need of updating the signature database poses a problem for the WMN architecture as the IDS database of each node has to be individually updated each time a new threat is found or needs to be updated.

Anomaly-based IDS defines statistically the normal behavior of the system, usually through automated training, and matches it against the actual system activity to signalize any significant deviation as anomalous behavior. The advantage is the capacity to detect unknown attacks without prior experience. The major drawback is the high false positive rate since normal behavior can have large deviations and it changes over time. Schmidt et al. [10] show how to monitor smartphones for extracting features for remote anomaly detection. Damopoulos et al. [11] also focused on anomaly detection for mobile devices by evaluating different machine learning algorithms with the users’ collected data. The approach classifies the behavior of users in terms of calls, SMS, and Web usage to detect illegitimate service use. However, concerning network traffic-level analysis, there is no clear separation between normal and abnormal network activities in mobile wireless scenarios since nodes can join or leave the network arbitrarily and the communication links are unreliable, e.g., with packet losses. Then, fluctuations on the transmission of packets can be attributed to a malicious node, or a node suffering from poor wireless connection, or a node moving independently in the network.

Specification-based IDS relies on a set of constraints/rules that describe the correct behavior of the system, and monitor the actual system behavior in execution regarding these constraints. This technique combines the strengths of anomaly-based and signature-based detection, and it is able to detect known and unknown attacks with low false positives. However, this approach cannot detect some type of attacks, such as DoS attacks because they do not directly violate the system specification. A number of specification-based IDSs [1214] have been proposed for WMNs which basically apply formal methods to specify the correct behavior of the routing protocol. Nevertheless, these approaches analyze the protocol behavior for each node separately not considering the point of view of the neighboring nodes for cooperative detection of attacks then limiting the intrusion detection process since WMNs are distributed and cooperative in nature and the wireless links have frequent instability.

3 Distributed intrusion detection architecture

A CIDS instance resides on each mesh node and monitors all network traffic that passes through it. Since mesh nodes are the unique source of data in the network, all nodes have to contribute to the process of intrusion detection by: (i) carrying out local monitoring; (ii) analyzing the collected traffic; and (iii) providing traffic information and detection results to the neighbors when needed. The proposed intrusion detection architecture is illustrated in Fig. 1.

Fig. 1
figure 1

Distributed intrusion detection architecture

Packets sent and received by the node are captured and filtered by the monitoring tool for further analysis. To that end, we employ Bro [15], a popular network-based IDS, which sniffs the node’s network interface by using the capturing library libpcap. We opted for Bro since it provides a modular and flexible architecture that allows adding user-defined protocol analyzers. In addition, Bro provides a specialized language for writing policy scripts. Bro is divided into two layers: an event engine layer that transforms a stream of (filtered) packets into a stream of higher-level network events, and a policy script interpreter layer, which executes security policy scripts written in Bro language for running these events. However, the most prominent Bro’s feature is its communication system, which we use for maintaining IDSs connected and implement the scheme of exchange of network events and intrusion detection information that perfectly fits into our distributed CIDS approach.

Considering that, we implemented: (i) the RPA module at detection tool level of the CIDS architecture, (ii) the attack detection scripts in the DIDE component, which is composed of three intrusion detection modules, and (iii) the Consensus Algorithm in the CCM module at script level in the architecture. We also implemented specific packet filters into Bro main analyzer script in order to select all routing packets we are interested in when monitoring the traffic.

In this work, we assume that the communication channel among CIDSs is secure and reliable, so the exchange of routing events and detection results cannot be compromised by external attackers. The existence of end-to-end message authentication scheme, such as traditional digital signature or message authentication code (MAC) allows each CIDS to efficiently verify the integrity and authenticity of messages exchanged between CIDSs. To that end, we can employ Bro secure transmission mechanism, which basically uses SSL protocol for key exchange, confidentiality, and message integrity. We do not consider attacks against the CIDS, such as DoS attacks, or against the CIDS communication mechanism, which is used to share detection results and traffic information, such message dropping attacks. We assume the CIDS instances exchange data successfully, and are not compromised. Protecting the CIDS is out of the scope of this work.

In the work [16], we proposed an earlier intrusion detection architecture focused on detecting route redirection/manipulation attacks. The detection modules of the IDS were customized to detect this specific attacking technique. In this paper, we propose a complete CIDS architecture which address the common Attacking Methods of routing protocols that we identified thorough security analysis of WMNs [12, 17, 18]: (1) Message Fabrication, (2) Packet Dropping, and (3) Message Flooding. These Attacking Methods target the network layer that is responsible for packet forwarding and routing operations in the node. Then, the attacker goals are: (i) disrupt the routing service by violating the integrity of routing tables and routing messages; (ii) cause DoS on the data delivery, i.e., disrupt the data forwarding service; and (iii) violate the availability of network services and mesh nodes.

3.1 BATMAN protocol

In this work, we make use of Better Approach To Mobile Ad hoc Networking (BATMAN) [19, 20], a proactive routing protocol specially designed for WMNs. We opted for BATMAN as routing protocol since results from practical experiments [21, 22] with BATMAN and other routing protocols, such as OLSR and AODV, have shown that BATMAN outperforms these protocols on most of performance metrics such as route trip delay (in terms of number of hops), route convergence latency, and packet loss rate. BATMAN achieves higher level of stability and packet delivery ratio presenting a lower route change frequency compared to the traditional routing protocols.

BATMAN routing algorithm is detailed as follows [23].

  1. 1.

    Each node, also referred as Originator O r , periodically broadcasts hello messages, known as Originator Messages (OGMs), to tell its neighbors about its existence. An OGM contains: the O r address, the Source node S n address, the Previous sender (node) P n address, a unique Sequence Number SN, a Time To Live (TTL), and a Transmission Quality (TQ) value.

  2. 2.

    Following a neighboring node N b receives an OGM: it modifies the S n address to its own address and rebroadcast the OGM in accordance to BATMAN forwarding rules to tell its neighboring nodes about the existence of the node that originated the OGM, and so on and so forth.

  3. 3.

    Hence, the mesh network is flooded with OGMs until every node has received it at least once, or until they got lost because of the packet loss in the links, or until their TTL value has expired.

  4. 4.

    The SN of OGM is used to verify the message freshness, i.e., to distinguish between new OGMs and duplicated OGMs in order to guarantee that each OGM is only counted once. The amount of OGMs (SN) received from a given O r via each N b is used as a metric to estimate the route quality. Thus, BATMAN chooses as best next-hop to this O r , the N b from which it has received the highest amount of OGMs within a Sliding Window (a packet count metric).

3.2 Routing protocol analyzer

Routing packets transmitted on the node link are captured and analyzed by the RPA. This module generates Routing Events based on the routing behavior of protocol, which are subsequently treated by Routing Event Handlers defined in the policy scripts of DIDE component. The output of RPA module is a stream of Routing Events that describes the observed activities of the routing protocol in semantically rich, high-level terms. The Routing Events themselves do not represent security alerts, but rather provide valuable input for further processing by distributed intrusion detection algorithms.

A Routing Event is a message containing parameters as the result of analysis of one or more routing packets. The RPA module can generate different sorts of Routing Events according to the protocol behavior and the type of network activity it desires to signalize. In order to exchange Routing Event messages between nodes, we employ Bro communication system that uses typical TCP socket client–server model to establish a TCP connection between two nodes for sending and receiving intrusion detection information. Bro uses a transparent communication model, where Bro instances do not differentiate between local and remote Routing Event. In order to receive remote Routing Events, each node specifies the type of Routing Events it wants to subscribe to, and which nodes will send those Routing Events. Based on that, each node forwards its Routing Events to the nodes that have subscribed to them. Then, the receiver nodes will execute respective Routing Event Handlers to treat these remote Routing Events as local Routing Events.

The IDS approach only requires RPA modules to exchange Routing Event messages with neighboring RPA modules within one-hop maximum so that communication and computation overhead can be reduced for the nodes. Exchanging Routing Events instead of exchanging local audit data among neighboring nodes achieves cooperativeness with less communication overhead and processing load for the mesh nodes.

In this work, we assume that Routing Event messages exchanged by the nodes are not violated by attackers, i.e., modified or dropped. However, the constant exchange of Routing Events guarantees a reliable and robust transmission of routing information between CIDSs for calculating accurate Misbehaving Metrics in the presence of communication failures, such as corruption and discarding of Routing Events.

We implement the RPA module in the context of BATMAN protocol, but the RPA can accommodate other routing protocols as well. The RPA handles BATMAN packets and generates the Routing Events OGM_* that represent the protocol semantics:

  1. 1)

    OGM_rcv(<params>): it means an OGM was received from one of the N b .

  2. 2)

    OGM_br(<params>): it means the node broadcasted an OGM to its N b .

  3. 3)

    OGM_rbr(<params>): it means the node rebroadcasted an OGM received from a N b .

The Routing Events < params > are related to the node local context, such as the MAC and IP address, and are extracted from OGMs, such as O r address, S n address, P n address, and SN value. The validation of these parameters by the CIDS prevents man-in-the-middle attacks, where the attacker sniffs and modifies messages exchanged between two end-points. Then, these Routing Events are treated by the own node and sent to its neighbors, which in turn will also treat the same Routing Events.

We have also defined additional BATMAN Events in order to check BATMAN packet integrity, for instance, if BATMAN packet content is adulterated or corrupted compared to BATMAN protocol specification during the message transmission.

  1. 1)

    OGM_trc(<params>): it means an OGM packet is truncated, i.e., the OGM packet size is less than the minimum BATMAN packet size allowed. In that case, <params > contains the related inconsistency detected.

  2. 2)

    OGM_err(<params>): it means an OGM packet contains an error such as a spurious field value. For example, the BATMN version identified in the OGM is not compatible with the current BATMAN version of node. Then, <params > contains the corresponding error code as well as the erroneous values found.

3.3 Distributed intrusion detection engine

The DIDE is implemented at script level of the CIDS architecture as policy scripts, i.e., intrusion detection algorithms. In DIDE component, each intrusion detection module treats the Routing Events generated by the node and also the Routing Events received from neighboring nodes for performing cooperative attack detection. Routing Events received from neighboring nodes are forwarded to individual intrusion detection modules of DIDE component by RPA module for subsequent treatment. Routing Event Handlers are defined at each intrusion detection module in order to process the respective Routing Events, i.e., each type of Routing Event is bound to a particular Routing Event Handler.

The output of DIDE component is Misbehaving Metrics, which indicate the existence of malicious routing behavior in the own node or in the communication range of the neighborhood. However, each intrusion detection module calculates individual Misbehaving Metrics independently.

The attack detection algorithms, i.e., policy scripts, are integrated into DIDE component. An attack detection algorithm is essentially made up of Routing Event Handlers that specify what to do whenever a given Routing Event occurs. We make use of Bro’s customized scripting language for implementing all the attack detection algorithms. Therefore, each intrusion detection module possess a set of attack detection algorithms that identify various types of attacks according to the attacking method of adversarial node.

The DIDE architecture is divided into three main intrusion detection modules as follow:

  1. 1.

    Packet Dropping Detection. This module is in charge of detecting malicious packet dropping behavior of misbehaving nodes such as selective/conditional, probabilistic, random, and periodic packet dropping. It processes local and remote Routing Events by making use of specific Routing Event Handlers, which in turn implement the algorithms for detection of the packet dropping variations. The detection algorithms compute the Misbehaving Metrics, i.e., Packet Dropping Metrics, which serve as input to the CCM module

  2. 2.

    Message Fabrication Detection. The module detects routing packets forged by the attacker and injected in the routing flow of the network. The forged packets usually follow the routing protocol specification, i.e., the message content is legitimate and messages interval and sequence are respected. The message fabrication detection algorithms use particular Routing Constraints to process the Routing Events and calculate the Misbehaving Metrics.

  3. 3.

    Flooding Detection. This module implements detection algorithms for recognizing overflow of routing packets through the network, which may contain legitimate content or not, e.g., forged S n addresses. The detection algorithms are close to the ones of the Message Fabrication Detection module, i.e., the Routing Constraints employed are similar. However, if Misbehaving Metrics calculated exceeds a certain threshold in a short period of time, an Active Response is promptly requested to the Response Mechanism without previous analysis of the Misbehaving Metrics by the CCM.

In this work, we do not address message flooding attacks since tracking down the source of the intrusion for this type of attack is not a trivial task considering the fact that the attack is carried out through intermediaries nodes in the network, i.e., not from direct neighbors. For instance, distributed denial of service (DDoS) attack is a message flooding attack, where the malicious flooding traffic is transmitted by innocent nodes that are simply rebroadcasting the phony messages received from neighboring nodes as expected by the protocol behavior, which in turn received these messages from other neighbors, and finally those bogus messages were primarily originated by the malicious nodes. Thus, our CIDS approach would identify intermediate honest nodes as the flooding nodes. However, it should trace the attack further back to the remote source nodes that firstly created the malicious traffic. The reason is that the CIDS communication mechanism is limited to local neighborhoods.

3.3.1 Message fabrication detection

The Message Fabrication Detection module implements the following Routing Event Handlers REH, which treat both local and remote Routing Events, i.e., the ones generated by the node and those received from the node’s neighbors N b .

  1. 1.

    REH_rcv: it extracts the parameters from local and remote OGM_rcv and stores the data in sets:

    • Local OGM_rcv: \( Rc{v}_{O_r}^{S_n}\leftarrow SN \),

      where \( Rc{v}_{O_r}^{S_n} \) is the set of SN received by the node for the O r from its neighbor S n .

    • Remote OGM_rcv: \( NR{v}_{O_r}^{S_n,{P}_n}\leftarrow SN \),

      where \( NR{v}_{O_r}^{S_n,{P}_n} \) is the set of SN received by the node’s neighbor S n for O r from P n , and P n is a neighbor of S n .

  2. 2.

    REH_rbr: it processes local and remote OGM_rbr and stores the data in sets as follow:

    • Local OGM_rbr: \( Rb{r}_{O_r}^{S_n}\leftarrow SN \),

      where \( Rb{r}_{O_r}^{S_n} \) is the set of SN rebroadcasted by the node for O r previously received from S n .

    • Remote OGM_rbr: \( NR{r}_{O_r}^{S_n,{P}_n}\leftarrow SN \),

      where \( NR{r}_{O_r}^{S_n,{P}_n} \) is the set of SN rebroadcasted by the node’s neighbor S n for O r received from P n .

We analyzed the BATMAN protocol specification and defined attack detection Constraints in order to prevent routing attacks which employ Message Fabrication Attacking Method. The Routing Constraints C rely on the protocol behavior for detecting attacks and are implemented in the REH. In case of any Routing Constraint C is violated, the message fabrication detection algorithm counts the number of inconsistencies F found for each Routing Constraint separately.

  1. 1)

    REH_rbr: C 1 ← if SN\( Rc{v}_{O_r}^{S_n} \) then F 1++,

    check if the SN rebroadcasted by the node for O r was received from its neighbor S n , where F 1 are the fake SN rebroadcasted by the node which are supposed to be firstly received from S n .

  2. 2)

    REH_rcv: C 2 ← if SN\( NR{v}_{O_r}^{S_n,{P}_n} \) then F 2++,

    check if the SN received by the node for O r was received from its neighbor S n which received it earlier from its neighbor P n , where F 2 are the fake SN received by the node from its neighbor S n , which are supposed to be firstly received by S n from its neighbor P n .

  3. 3)

    REH_rcv: C 3 ← if SN\( NR{r}_{O_r}^{S_n,{P}_n} \) then F 3++,

    check if the SN received by the node for O r was rebroadcasted by its neighbor S n , where F 3 are the fake SN received by the node which are supposed to be rebroadcasted by its neighbor S n .

    Then, the detection algorithm calculates rates of routing misbehavior R considering the inconsistencies F detected, which are the Message Fabrication Metrics output by this module.

    • \( {R}_1=\frac{F_1}{\left| Rb{r}_{O_r}^{S_n}\right|} \), where R 1 is the rate of fake SN rebroadcasted by the node for O r .

    • \( {R}_2=\frac{F_2}{\left| Rc{v}_{O_r}^{S_n}\right|} \), where R 2 is the rate of fake SN received by the node from its neighbor S n for O r that were not received by S n from P n .

    • \( {R}_3\frac{F_3}{\left| Rc{v}_{O_r}^{S_n}\right|} \), where R 3 is the rate of fake SN received by the node for O r that were not rebroadcasted by its neighbor S n .

The Message Fabrication Metrics R are provided to the CCM module for further analysis.

3.3.2 Packet dropping detection

The Packet Dropping Detection module implements the same REH as the Message Fabrication Detection module to process local and remote Routing Events plus this REH_br:

  1. 1.

    REH_br: it processes the parameters of local and remote ORM_br and stores the data in the sets:

    • Br ← SN, where Br is set of SN broadcasted by the node.

    • \( NB{r}^{S_n}\leftarrow SN \), where \( NB{r}^{S_n} \) is set of SN broadcasted by the node’s neighbor S n .

The REH store the current routing protocol activity and incorporate it into the analyses of the future routing activity. For that purpose, we also define Routing Constraints C that are implemented by the REH in order to count the number of malicious packet dropping PD if the any C is violated.

  1. 1)

    REH_rcv: it implements C 1 that checks the routing behavior of the node’s N b by using remote OGM_rcv, and C 2 that checks the routing behavior of the node applying local OGM_rcv.

    • C 1 ← if SN\( NR{r}_{O_r}^{S_n,{P}_n} \) then PD 1++,

      check if the SN received by the node’s neighbor S n from P n for O r is rebroadcasted by S n , where PD 1 are the SN dropped by S n for O r that are supposed to be rebroadcasted by S n .

    • C 2 ← if SN\( Rb{r}_{O_r}^{S_n} \) then PD 2++,

      check if the SN received by the node from its neighbor S n for O r is rebroadcasted by the own node, where PD 2 are SN dropped by the node for O r that are supposed to be rebroadcasted by the node.

  2. 2)

    REH_br: it implements C 3 for checking the routing behavior of the node and its N b using local OGM_br, and C 4 to check the routing behavior of the node and its N b applying remote OGM_rcv.

    • C 3 ← if SN\( Rc{v}_{O_r}^{S_n} \) then PD 3++,

      check if the SN broadcasted by the node’s neighbor S n has been received by the node, where PD 3 are the SN dropped by the node that are originated by S n .

    • C 4 ← if SN\( NR{v}_{O_r}^{S_n,{P}_n} \) then PD 4++,

      check if the SN broadcasted by the node is received by the neighbor S n , where PD 4 are the SN presumably dropped by the neighboring node S n .

The next step of the packet dropping detection algorithm is calculate the rates of packet drop routing misbehavior RD for the respective C. The RD values compose the Packet Dropping Metrics.

  • \( R{D}_1=\frac{P{D}_1}{\left| NR{v}_{O_r}^{S_n,{P}_n}\right|} \) where RD 1 is the rate of SN dropped by the neighbor S n for O r that are not forwarded by S n .

  • \( R{D}_2=\frac{P{D}_2}{\left| Rc{v}_{O_r}^{S_n}\right|} \), where RD 2 is the rate of SN dropped by the node for O r that are not forwarded by the node.

  • \( R{D}_3=\frac{P{D}_3}{\left| NR{r}_{O_r}^{S_n,{P}_n}\right|} \), where RD 3 is the rate of SN broadcasted by S n that are dropped by the node before processed by the node.

  • \( R{D}_4=\frac{P{D}_4}{\left| Br\right|} \), where RD 4 is the rate of SN broadcasted by the node that are dropped by the neighbor S n before being processed by S n .

The Packet Dropping Metrics RD are constantly supplied to the CCM module for distributed detection of packet dropping misbehavior. These Metrics are sufficient for detecting most of the Packet Dropping Attacking Method variants such as arbitrary and intermittent packet dropping.

3.4 Cooperative consensus mechanism

Nodes in a neighborhood collaborate with each other to make a decision about a suspicious node. This is performed by applying a consensus decision-making procedure [24], in which a node is considered malicious when the majority of its neighbors have reached a consensus. If all nodes contribute to the outcome, a few malicious or malfunctioning nodes cannot easily disrupt the consensus process.

The reason for such procedure is that Misbehaving Metrics calculated by only one node may be inconsistent because of: (i) the packet loss of wireless links caused by interferences and packet collision, (ii) the node is distant from the other nodes, (iii) the node is faulty, (iv) the node is compromised, (v) mobility of nodes, for instance, a roaming node. Consequently, this impacted node will oppose to the decision of the majority. Then, a legitimate node transmitting on a deficient and saturated link may be erroneously considered as a malicious node. Therefore, the consensus process assures that malicious nodes will be properly identified in a neighborhood.

The consensus process seeks to generate full agreement from all participating nodes by combining the outcomes from different nodes to form a consensual judgment about the suspected node. However, if a few nodes do not consent to the verdict, the consensus process applies the decision rule that evaluates if the current level of agreement is sufficient to finalize the decision. For the decision rule, we use the approach: “unanimous agreement minus one vote or two votes” [24].

This decision rule is applied during the consensus process, which collects as much agreement as possible but allows the decision to be finalized without requiring unanimity, i.e., the consensus is reached without the consent of all nodes involved in the decision process. In this case, if some nodes do not completely agree or have a strong objection to the choice made, they will have to accept the decision and live with it. We do not consider modifying the decision of the majority to generate unanimous agreement or unanimous consent.

In realistic WMNs, all nodes in the neighborhood should detect the same routing intrusions with equally accuracy, then reaching consensual decisions about those intrusion attempts with unanimous agreement. However, communication links of a few nodes may have poor quality due to packet collision and congested channels or because these nodes are very distant from the other nodes, then resulting in inconsistent intrusion detection results. Consequently, these nodes will oppose to the decision of the majority. Therefore, to achieve a consensus in such wireless environment with network fluctuations, we employ the decision rule, which does not demand unanimous consent from all of the nodes but the agreement from the majority. This consensus approach is considered fair for WMNs since it is highly improbable to reach unanimous consent in such unstable wireless scenario, and the final decision satisfy the expectation of all nodes, i.e., the legitimate nodes. Nonetheless, if the consensus process of various nodes in the neighborhood, i.e., the majority of nodes, is compromised, an attacker can induce the legitimate nodes to make erroneous decisions, then violating the consensus approach.

The consensus process together with the decision rule is implemented in CCM module as Consensus Algorithms in the form of policy scripts. The CCM module is integrated to the CIDS architecture at script level as seen in Fig. 1. The Consensus Algorithm is detailed as follow.

  • The CCM module analyzes Misbehaving Metrics supplied by the different intrusion detection modules to eventually diagnose if a routing attack is occurring relying on a threshold scheme. For example, Packet Dropping Metrics provided by the Packet Dropping Detection module, which represent, for instance, the rate of routing packets received by a node that are intentionally discarded, are carefully analyzed by CCM module to identify possible packet dropping attack attempts.

  • If strong proof of malicious routing behavior is detected by CMM module, then it consults neighboring CCM modules for sharing Detection Results, i.e., the Misbehaving Metrics related to the alleged routing intrusion.

  • Based on the analysis of the Misbehaving Metrics from the own node and neighboring nodes using thresholds, each node makes its individual decision about the routing intrusion and designates or do not a suspected node as malicious.

  • Thus, the CCM module shares the decisions among neighboring CCM modules, where each node designates a suspicious node or do not designated, if the node finds any evidence of attack occurrence when analyzing the Misbehaving Metrics.

  • If all nodes agree on a common suspected node, then unanimous agreement is reached and the final decision is made on the inculpation of the malicious node.

  • If any node disagrees with the decision of majority, then CCM module applies the decision rule to achieve consensus, i.e., finalize the decision about the suspicious node but without unanimous consent. In that case, there are two possibilities:

    • The majority of nodes make a joint decision on the same node, and then a consensus is reached on the guiltiness of the suspected node.

    • The majority of nodes do not designate any suspicious node, and then a consensus is also reached, which means no routing attack is occurring in the neighborhood and there is no malicious node.

    In both cases, it is likely that a few nodes are suffering from impairments on their wireless communication links or have reduced transmission power, which lead these nodes to make such divergent decisions in relation to the majority.

  • If half of nodes decide upon the same sentence for a suspected node and the other half does not point out any suspicious node, there is no consensus among the participating nodes and no final decision is made. In this particular case, the Consensus Algorithm evokes a passive action from the Response Mechanism to request further investigation.

The Consensus Algorithm of CCM module is exemplified in Fig. 2. For a better understanding, we consider that the algorithm analyses the Message Fabrication Metrics \( {R}^{O_n} \) calculated by node O n . We assume that neighbor N 1 of O n is broadcasting fake routing messages, so the Consensus Algorithm is triggered by O n . Since Routing Events exchanged by the nodes can be lost, the calculated Metrics R are not accurate. Then, a threshold Th is necessary to distinguish between false positive rate and routing misbehavior rate when analyzing the Misbehaving Metrics, as explained in Section 3.4.1. The thresholds \( T{h}^{O_n}=\left\{T{h}_1^{O_n},T{h}_2^{O_n},T{h}_3^{O_n}\right\} \) are defined for O n with respect to Metrics \( {R}^{O_n} \). Th is calculated according to the packet loss rate of node O n ’s link. The Consensus Algorithm follows the steps.

Fig. 2
figure 2

Consensus algorithm for message fabrication detection

  1. 1.

    As \( {R}_2^{O_n}\ge T{h}_2^{O_n} \), node O n suspects a routing attack is occurring in the vicinity, i.e., neighbor N 1 is fabricating packets, and then starts the consensus process.

  2. 2.

    O n sends its Detection Results \( D{R}^{O_n}=\left\{{R}_1^{O_n},{R}_2^{O_n},{R}_3^{O_n}\right\} \) and thresholds \( T{h}^{O_n} \) to neighbors P = {N 1, …, N k } and receives the respective \( D{R}^{N_1},\dots, D{R}^{N_k} \) and \( T{h}^{N_1},\dots, T{h}^{N_k} \) from N i , 1 ≤ i ≤ k.

  3. 3.

    Then, O n analyses all \( D{R}^{N_i} \), particularly \( D{R}^{N_i} \) from N 1, and decide to indicate N 1 as malicious node \( M{N}_{O_n} \). Similarly, the neighbors N i execute the same analysis and indicate their \( M{N}_{N_i} \) or not.

  4. 4.

    So, node O n exchanges its decision \( M{N}_{O_n} \) with the neighbors N i and collect the decisions \( M{N}_{N_i} \).

  5. 5.

    If all neighbors N i indicate the same node as malicious, i.e., \( M{N}_{O_n}=M{N}_{N_i} \), the final decision is made with unanimity, and an active reaction is requested to Response Mechanism.

  6. 6.

    If a neighbor N i indicates a different malicious node \( M{N}_{N_i} \): \( M{N}_{N_i}\ne M{N}_{O_n} \), the decision rule is executed to achieve consensus. In that case, there are three possibilities:

    1. i.

      The majority of neighbors N i indicate N 1 as malicious, then consensus is reached on the guiltiness of N 1, and an Active Response is solicited from the Response Mechanism.

    2. ii.

      The majority of neighbors N i indicate no malicious node, then consensus is also reached on the innocence of N 1, and no response is required from Response Mechanism.

    3. iii.

      Nearly half of neighbors N i designate N 1 as malicious and the other half indicates no malicious node, then the consensus process has failed, but an alarm is triggered for further investigation.

The Detection Results DR and decisions MN are shared between nodes by using Bro variable synchronization mechanism [25], where each modification a nodes does to a given variable, that is declared as synchronizable, it is propagated to the same variable of the neighbors, that likewise defined this variable as synchronizable. This synchronization scheme provides a global view of the Detection Results and individual decisions made by a node to all the nodes in the synchronization list. We assume that Intrusion Detection Results and decision messages exchanged during the consensus process are not compromised, i.e., modified or dropped by malicious nodes. The security of the CIDS communication mechanism and consensus approach is not addressed in this work.

3.4.1 Threshold approach

In case of degradations in the link quality, the CIDS cannot determine if packets are being dropped because of natural packet loss of the wireless links or malicious behavior performed by the adversarial node. Therefore, we develop a threshold-based approach to improve the investigation of Misbehaving Metrics at the CCM module in order to achieve higher detection rate and lower false alarm rate, then enhancing the CIDS detection accuracy.

Since we define a border separating true positive rate from false positive rate, the false negatives are automatically eliminated and the range of false positives is delimited and can be appropriately measured. In our approach, false negatives only occur if Routing Events are discarded and the corresponding routing messages are misused by the attacker. As we assume that Routing Events are not compromised, then false negatives will not be detected during the intrusion detection process.

In addition, the threshold mechanism helps the nodes to make more accurate independent decisions about suspected nodes. For example, a few nodes can be misguided in making decisions resulted from impairments in their wireless transmissions, such high packet loss, consequently a well-behaved node can be unfairly accused if several nodes in the vicinity make such inconsistent decision. Hence, the threshold scheme assures that the majority of nodes will make the correct decision about the detected intrusion and the suspicious node. The threshold approach is explained as follow.

  1. 1.

    For a given WMN deployed, first we check if the Misbehaving Metrics supplied by the different DIDE intrusion detection modules do not indicate the presence of malicious routing behavior. If so, the threshold calculation procedure is triggered by the CCM module, otherwise the threshold calculation is postponed until no evidence of malicious misbehavior is detected in the neighborhood. Due to the intrinsic characteristics of the network, such as noisy areas, some communication links will undergo quality impairments: packet drop or packet corruption.

  2. 2.

    Based on that, the nodes impacted by these transmission limitations will be forced to calculate respective Misbehaving Metrics since Routing Events are unintentionally dropped or corrupted. These Misbehaving Metrics are in fact false positives since no attack is occurring in network.

  3. 3.

    Then, we calculate the threshold values, considering a given period of time, for the corresponding Misbehaving Metrics computed by each node. The threshold is the average number of violations of the related Routing Constraint C within the considered period.

  4. 4.

    The threshold calculation is repeated when necessary, e.g., in case of significant changes in the network topology or in the traffic load caused by mobility of nodes and surroundings.

Defining a proper threshold is important since the primary measure of effective intrusion detection in the network is low false positives and no false negatives. Our threshold approach pre-calculates thresholds for a given network topology considering a given time period. However, if the traffic load of the network changes unexpectedly, i.e., several nodes join the mesh network at the same time and the network topology is redefined, then the calculated threshold could no more possibly reflect the actual network performance in terms of link quality, i.e., packet loss rate due to channel congestion and collision. Then, it is necessary to recalculate the threshold values for all the nodes, particularly to the ones directly impacted by the topology modification and link quality changes.

Therefore, we apply a dynamic threshold approach, where the threshold values are dynamically calculated for all the nodes at regular intervals and calculated separately for neighborhoods where new nodes are added to it.

  1. 1.

    In fact, every 2 h, the threshold calculation scheme, that was previously presented, is triggered by the CCM module for individual nodes to estimate the threshold values for the nodes. Different threshold values are calculated per node according to the type of Misbehaving Metric detected by the DIDE component and the packet loss rate in the node’s links.

  2. 2.

    Alternatively, the threshold calculation algorithm is executed by the node’s CCM module in case of a node moves out or move into the communication range of the node. Timeout values are added to the CCM module, which are triggered before initiating the threshold calculation, in order to avoid hastily triggering the threshold calculation procedure. The CCM module makes use of the mobility management scheme provided by the routing protocol [20] in order to be aware of the current mobility state of each node in the network, i.e., to know what are the current neighbors of each node, and if a particular node is in roaming state or is no more part of the network.

3.5 Response mechanism

The CCM module sends solicitations of Passive/Active Response to the Response Mechanism module according to the response policy defined for each Attacking Method and more specifically attacking technique that are described in Section 3. For instance, selfishness node behavior, which is an attacking technique based on Packet Dropping Attacking Method requires an Passive Response since the damage to the routing service is tolerable and the network administrator has enough time to investigate the attack in detail. However, for blackhole attacks, where all data packets are discarded by the malicious node, an Active Response is demanded because the attack effect on the data delivery service is catastrophic for the network users, therefore a mitigation action has to be applied rapidly.

  • Passive Response: In this procedure, the CIDS firstly outputs all the relevant intrusion information to log files. Then, the CIDS raises alerts according to a notification policy to inform the concerned end-users and authorities about the detected attack and suspicious node behavior. The Passive Response Mechanism is implemented as policy scripts and makes use of Bro logging and notification framework [25] to record the intrusion data and send such alerts to the entities.

  • Active Response: The CIDS tries to mitigate the impact of the attack on the network or tries to deter the attack on the compromised node. Active reactions are executed taking into account the Attacking Method and attacking technique utilized by the attacker. For example, if the adversary is employing Message Fabrication or Message Flooding Attacking Methods, the CIDS discards the phony routing messages generated by the malicious node. This attack response can be carried out in the malicious node, which is more efficient, or in the target nodes, which also works.

4 Performance evaluation

In this section, we use the Virtualized Mesh Network platform to evaluate the efficiency of our distributed intrusion detection approach and demonstrate its feasibility through implementation. The experimental results focus on showing the effectiveness of the CIDS approach in the context of the Message Fabrication Detection module. This platform has also been used in previous works [26, 27] for security analysis and emulation of routing attacks in WMNs.

4.1 Experimental platform

The configuration of the Virtualized Mesh Network is illustrated in Fig. 3. The mesh topology is emulated by VMs composed of QEMU [28] instances, with Linux OS preinstalled, running BATMAN routing protocol. Batman-adv v.2011.1.0 [20] is compiled from source code and loaded into Linux OS as kernel module. A QEMU instance consists of standard IBM PC 32Bit hardware architecture with one CPU, which is the default 32Bit CPU type of QEMU, and 256 MB of main memory. We assume each mesh node in the network has equivalent hardware configuration to the QEMU instance, for instance, a Wi-Fi mesh router or a wireless mesh device with enough hardware resources.

Fig. 3
figure 3

Virtualized mesh network architecture

The QEMU instances, which represent the nodes, are attached to a virtual switch [29] and are interconnected by wirefilter tool [29] which emulates bidirectional (symmetric) links between the nodes and the corresponding link degradations including packet loss, lost burstiness, and bandwidth. For the experiments, we assume a fixed topology, but node mobility could be emulated as well.

The modules of the distributed CIDS architecture, described in Section 3, are implemented using Bro v.1.5.3 [25]. Each node has one configured network interface that is used by BATMAN to broadcast routing packets and by the CIDS for exchanging intrusion detection messages.

During the experiments execution, we make use of Bro and QEMU logging mechanism to collect: (i) intrusion detection data such as Misbehaving Metrics, (ii) routing information such as pcap files, and (iii) performance metrics such as CPU and memory consumption. These data are saved to log files that are synchronized with a global clock, i.e., the time of the main machine that controls the VMs.

4.2 Threshold calculation

To estimate the threshold, we apply the approach detailed in Section 3.4.1. Then, before we start the experiments for intrusion detection, we execute the distributed CIDS approach in the mesh topology to calculate thresholds Th for the respective Misbehaving Metrics R. Figure 4 presents the emulated mesh network, which is comprised of nine nodes: O 1, …, O 9. Link L 1 between nodes O 1 and O 2, and L 2 between O 1 and O 4 have packet loss rate of 15 %, which is considered high. In this experiment, all nodes are well behaving, i.e., no node disseminates malicious messages. Then, we collect the Misbehaving Metrics R calculated by nodes O 1, O 2, and O 4 for defining the thresholds \( T{h}^{O_1},T{h}^{O_2} \), and \( T{h}^{O_4} \). These Misbehaving Metrics are caused by the loss of Routing Events exchanged by these nodes.

Fig. 4
figure 4

Mesh topology for threshold estimation

The duration of the experiment is 30 min. Figure 5 shows the rates of routing misbehavior R 2 and R 3 calculated by nodes O 1 and O 2 in respect to L 1 during the experiment run, which are actually false positives. We note that the maximum value of \( {R}_2^{O_1} \) is 10.4 % and for \( {R}_3^{O_1} \) is 10 %, and the minimum value of \( {R}_2^{O_1} \) is 1.8 % and for \( {R}_3^{O_1} \) is 1.4 %, therefore both rates \( {R}_2^{O_1} \) and \( {R}_3^{O_1} \) are equally affected by the packet loss rate in L 1. Concerning node O 2, we observe the same behavior for R 2 and R 3 since the maximum value of \( {R}_2^{O_2} \) is 3.4 % and of \( {R}_3^{O_2} \) is 2.3 %, and the minimum value of \( {R}_2^{O_2} \) and \( {R}_3^{O_2} \) is 0.4 %.

Fig. 5
figure 5

Misbehaving metrics calculated by nodes O 1 and O 2

Threshold \( T{h}^{O_n} \) for the rate \( {R}^{O_n} \) is defined as: \( T{h}^{O_n}k=\frac{{\displaystyle \sum {R}^{O_n}}}{N} \), where N is total number of the average violations F detected by node O n for a Routing Constraint C in terms of \( {R}^{O_n} \). Table 1 presents the thresholds Th 1, Th 2, and Th 3 calculated for nodes O 1, O 2, and O 4 with respect to the rates R 1, R 2, and R 3. Since the loss of Routing Events does not have influence on R 1, i.e., the packet loss rate does not impact R 1, we fix \( T{h}_1^{O_1}=4\% \), and \( T{h}_1^{O_2}=T{h}_1^{O_4}=2\% \). We note that the thresholds Th 1 and Th 2 of O 1 and O 2 calculated in terms of L 1 are more impacted by the packet loss compared to the thresholds of O 1 and O 4 in terms of L 2. This is caused by the random packet loss model of wirefilter used in links L 1 and L 2 of O 1, which presents different dropping behavior. However, Th 1 and Th 2 of nodes O 2 and O 2 have similar values because both nodes are impacted by equal packet loss rate in their links with O 1.

Table 1 Thresholds calculated for nodes O 1, O 2, and O 4

The estimated thresholds can be applied to larger topologies since they rely exclusively on the pack loss rate between links. Therefore, if the number of neighboring nodes increases and the amount of packets transmitted also increases, but the packet loss rate undergoes small variations, then the calculated thresholds are applicable. Hence, the threshold approach is considered scalable.

4.3 Attack emulation

In this scenario, nodes O 2 and O 4 are compromised, as shown in Fig. 6, and disseminates fake OGMs in the name of gateway node O 1 with O r  = O 1 in order to divert the route of the target nodes toward gateway node O 1 to the malicious nodes O 2 and O 4. As routing decisions are based on statistical analysis of the amount of OGMs received instead of information contained in packets, the malicious nodes have to fabricate a number of fake OGMs that are numerous enough to surpass the legitimate OGMs of O 1 in order to redirect the route of the target nodes. In other words, the malicious nodes have to continuously win the neighbor ranking of the target nodes towards the gateway node, then overriding legitimate OGMs broadcasted by the gateway node. In the work [27], we have presented a detailed study of route manipulation attacks and their consequences for the routing service of WMNs.

Fig. 6
figure 6

Malicious scenario for detection of message fabrication attacks

Since links L 1 and L 2 have packet loss rate (15 %) higher than links L 3 and L 4 (0 %), normally the target nodes prefer a route toward the gateway node via nodes O 3 and O 5. For instance, in the routing table of node O 7, the node O 1’s route entry has as best next-hop node O 6 rather than O 2. However, after the routing attack is executed, the target nodes will change the best next-hop of O 1’s route entry in their routing tables choosing new routes through the malicious nodes O 2 and O 4. This routing manipulation attack aims to disrupt the network routes by violating the integrity of the routing tables.

We implement this routing attack by using the packet generator packETH [30]. For that, we added a new functionality to the tool which allows the malicious nodes O 2 and O 4 fabricating and broadcasting fake OGMs with continuous valid SN on the network interface of nodes O 2 and O 4 in conformance with BATMAN flooding mechanism. In that way, we are able to emulate route redirection attack attacks by executing the modified tool on the compromised nodes O 2 and O 4.

4.4 Results

We carried out several experiments to evaluate the accuracy of our distributed intrusion detection approach. We used different broadcasting rates (500 ms, 1,000 ms, 1,500 ms, and 2,000 ms), i.e., Originator Intervals, for sending fake OGMs. The duration of each experiment is 30 min. Figure 7 shows the rates of routing misbehavior \( {R}^{O_n} \) calculated by nodes O 1, O 2, O 4, O 7, and O 8 whereas malicious nodes O 2 and O 4 broadcasts fake OGMs at rate of 1,000 ms.

Fig. 7
figure 7

Misbehaving Metrics calculated by nodes O 2, O 4, O 1, O 7, and O 8 a Rate of message fabrication calculated by node O 2 b Rate of message fabrication calculated by node O 4 c Rate of message fabrication calculated by node O 1 d Rate of message fabrication calculated by nodes O 7 and O 8

In Fig. 7a, the malicious node O 2 starts broadcasting fake OGMs at 66 s, thus we can remark a large increase of rate \( {R}_1^{O_2} \), and consequently the increase of its neighbors’ rates \( {R}_2^{O_1,{O}_2} \) and \( {R}_2^{O_7} \), as shown in Fig. 7c and d, respectively. Since the detected inconsistencies \( {R}_1^{O_2} \) continues increasing during the attack execution at node O 2, then at 69 s \( {R}_1^{O_2}\ge T{h}_1^{O_2} \) as seen in Fig. 7a. Moreover, at 82 s \( {R}_2^{O_1,{O}_2}\ge T{h}_2^{O_1} \), and at 71 s \( {R}_2^{O_7}\ge T{h}_2^{O_7} \), as seen in Fig. 7c and d. We assume \( T{h}^{O_7}=T{h}^{O_2} \) and \( T{h}^{O_8}=T{h}^{O_4} \) since nodes O 7 and O 8’s links L 5 and L 6 are not affected by packet loss. Since \( {R}_1^{O_2} \) surpasses \( T{h}_1^{O_2} \) earlier than nodes O 1 and O 7, then node O 2 first perceives that possibly Message Fabrication routing attack is happening in the neighborhood and trigger the consensus procedure, as explained in Section 3.4. After exchanging Detection Results \( D{R}^{O_1} \), \( D{R}^{O_2} \), and \( D{R}^{O_7} \) with neighbors O 1 and O 7, CCM module of O 2 realizes that own node O 2 is generating phony OGMs with O r  = O 1, then it designates the malicious node \( M{N}_{O_2}\leftarrow {O}_2 \). Accordingly, nodes O 1 and O 7 conclude that the OGMs received from O 2 for O r  = O 1 are not previously received by O 2 from node O 1, i.e., node O 2 is fabricating fake OGMs for O 1, and identically indicate the malicious node \( \left\{M{N}_{O_1},M{N}_{O_7}\right\}\leftarrow {O}_2 \). Then, the three nodes share their individual decisions. Since all nodes agree on the same malicious node, at this point, a consensus is reached and node O 2 is deemed malicious.

The same intrusion detection process is executed concerning malicious node O 4 that starts fabricating OGMs at 92 s, as seen in Fig. 7b. As a consequence, O 4’s rate \( {R}_1^{O_4} \) rapidly increases and also the rates \( {R}_2^{O_1,{O}_4} \) and \( {R}_2^{O_8} \) of neighbors O 1 and O 8, as seen in Fig. 7c and d. Rate \( {R}_1^{O_4} \) accurately follows the broadcasting rate of fake OGMs at O 4. In Fig. 7b, we see \( {R}_1^{O_4}\ge T{h}_1^{O_4} \) at 97 s. In addition, \( {R}_2^{O_1,{O}_4}\ge T{h}_2^{O_1} \) at 105 s, and \( {R}_2^{O_8}\ge T{h}_2^{O_8} \) at 100 s, showed in Fig. 7c and d. Since \( {R}_1^{O_4} \) exceeds \( T{h}_1^{O_4} \) before than O 1 and O 8, O 4’s CIDS detects a potential Message Fabrication attack in the node and initiates the consensus mechanism as explained before for malicious node O 2.

Possible countermeasures for this type of Message Fabrication attack are: (i) the CIDS at node O 2 drops the fake OGMs broadcasted by O 2 for O r  = O 1 that violate Routing Constraint C 1, (ii) deny network access to node O 2 in the control list of the mesh routers, (ii) alternatively, the CIDSs of nodes O 1 and O 7 drop the fake OGMs received from node O 2 for O r  = O 1 that violate Routing Constraint C 2.

4.5 Detection accuracy

The detection rate, i.e., the true positive rate, is defined as: \( Dr=\frac{F- FT}{ TF} \), where F are the inconsistencies detected by the node during the attack execution, FT are the false positives detected during the threshold calculation, and TF are the total number of fake OGMs broadcasted by the attacker. The detection rate Dr calculated for nodes O 1, O 2, O 4, O 7 and O 8 is presented in Table 2.

Table 2 Accuracy of intrusion detection for nodes O 1, O 2, O 4, O 7 and O 8

For nodes O 2 and O 4, we obtained Dr = 100 %, which is normal since the detection of routing misbehavior in these node relies exclusively on local Routing Events which is not affected by the link quality. Moreover, for nodes O 7 and O 8, we found Dr = 100 % as there is no packet loss in links L 5 and L 6. And for node O 1, we found Dr = 85.30 % and Dr = 85.93 % because of the high packet loss rate (15 %) in links L 1 and L 2 which seriously impacts the intrusion detection rate at O 1. However, in realistic wireless environment, where packet loss rate of links is not so elevated and impacts equally the neighborhood, it is likely that the detection rate of the nodes would be more distributed.

The false positive rate Fr is the same as the ones calculated in Section 4.2, as showed on Table 2. The false positive rate of nodes O 2, O 4, O 7 and O 8 is not correct since these nodes are not influenced by link impairments. Nonetheless, the false positive rate of node O 1 is more coherent but not exact for a real WMN where the packet loss rate of links is equitable among the neighbors.

The false negative rate is defined as: \( Nr=1-\frac{F}{ TF} \). Then, we obtain Nr = 0 % for nodes O 2, O 4, O 7 and O 8, and Nr ≈ 12 % for O 1. In that case, the false negative rate is also impacted by the considerable packet drop rate in links L 1 and L 2. However, if we consider there is no packet loss rate in links L 1 and L 2, we obtain Nr = 0 % for node O 1, since, as said in Section 3.4.1, we assume that CIDS messages are not compromised, then all packets fabricated and disseminated by the malicious nodes are precisely detected by O 1. Therefore, the distributed CIDS implementation presents a good performance with high detection rate and low false positives and false negatives in scenarios where the network performance is degraded, i.e., the communication between CIDSs is deteriorated.

4.6 CPU and memory consumption

The computation overhead of the CIDS instance consists of: (i) capturing routing packets from the network interface; (ii) processing the routing packets; (iii) generating Routing Events; and (iv) executing the attack detection scripts. In addition, the CIDS exchanges Routing Events with neighboring CIDSs through IP packets and saves all the related intrusion detection information to log files.

We measured the CPU and memory usage at node O 1 during the intrusion detection operations considering two scenarios: (i) S1: no malicious node is present as described in Section 4.2; (ii) S2: the attacker performs message fabrication attack, as presented in Section 4.3.

CPU and memory consumption are measured by employing sysstat performance measurement tool [31]. Then, we executed the pidstat command of sysstat to monitor the percentage of CPU time used by Bro task while executing at system level, i.e., Linux kernel space, and the percentage of memory used by Bro task with respect to the total physical memory. After that, we collected the performance log data from the VM. Figures 8 and 9 show the computation overhead resulting from the execution of the CIDS tool at node O 1.

Fig. 8
figure 8

CPU usage for the CIDS at node O 1 a CPU consumption over time b Cumulative Distribution Function (CDF) of CPU consumption

Fig. 9
figure 9

Memory usage for the CIDS at node O 1

In Fig. 8a and b, we note that the CPU utilization for the CIDS in scenarios S1 and S2, i.e., the percentage of CPU time spent by the CIDS while executing at the kernel, is fair considering all the operations executed by the CIDS during the intrusion detection process. In this scenario, the average CPU usage is 20.04 % while the highest utilization is 26.53 % and the lowest consumption is 13 %, as shown in Table 3. However, for scenario S2, we observe a reduction of CPU usage to 16 % at the moment the attacker starts broadcasting fake OGMs in the network interface of O 1, i.e., at 223 s, as shown in Fig. 8a. In S2, before the attack is initiated, the average CPU usage is 17.59 %, having a peak of 22 % and the lowest usage is 11.22 %, which is less than the CIDS’s CPU consumption in S1, as presented in Table 3. Nevertheless, after the attack is triggered the mean CPU usage is 18.04 % having a minimal utilization of 10 % and maximal usage of 26.53 %. However, for the entire experiment duration in S2, we notice the average CPU utilization is reduced to 17.96 %, with a difference of 2.1 % compared to the average CPU usage in S1. This reduction of CPU consumption in S2 compared to S1 is more remarkable in the Cumulative Distribution Function (CDF) chart of Fig. 8b.

Table 3 CPU statistics for the CIDS at node O 1

In scenario S2, the CPU should be more occupied by the CIDS since the intrusion detection operations are more intensive, such as routing packets treated per second, Routing Events generated and exchanged, and the intense logging of intrusion detection information, in consequence of the attack execution at O 1’s neighbors O 2 and O 4. Nonetheless, we perceive an opposite behavior from the CIDS, i.e., the average CPU usage decreases when the traffic load increases. That unusual behavior is attributed to a “software bug” in Bro communication framework, where Bro process keeps an initial CPU usage of about 30 % regardless of whether there is network traffic to analyze, but if the traffic load increases, the CPU consumption is progressively reduced until it is stabilized.

In Fig. 9, we note that memory consumption in both scenarios S1 and S2 gradually increases over the experiment execution. In Table 4, the average memory consumed in S1 is 7.28 % of 256 MB of main memory that is equivalent to 18152.74 Kbytes (17.73 Mbytes), which is indeed the physical memory used by the CIDS. In S2, the upper limit of memory utilization is 8.82 %, that is during the attack execution, and the lower limit of memory usage is 5.62 % which is before the attack execution, as presented in Table 4. Then, in S2, we notice an increase of 3.2 %, i.e., 7.78 Mbytes, during the intrusion detection process, which is acceptable taking into consideration all the attack detection operations performed by the CIDS. Therefore, the intrusion detection operations performed by the CIDS incur a small memory overhead regarding the main physical memory size. Since a common wireless router is equipped with 64 MB of RAM memory, then the memory overhead added by the CIDS is low.

Table 4 Memory statistics for CIDS at node O 1

The CPU and memory consumed by the CIDS do not achieve high levels even so the number of packets/sec increase on the node interface and consequently the intrusion detection activities. Therefore, the intrusion detection mechanism is scalable in respect to computational overhead. The CPU and memory overhead maintain at a stable level of about 18 % (CPU time) and 7.17 % (17.45 Mbytes) respectively during the distributed CIDS operations. According to Bro’s user manual, if CPU usage is less than 50 % and memory usage is less than 70 % of physical memory, then the CIDS is able to perform accurate analysis, i.e., the CIDS will not drop packets due to lack of computational resources.

4.7 Communication overhead

We calculate the message overhead using the scenario of Section 4.2. Since the RPA module has to analyze each OGM message and generates a respective Routing Event to the neighbors, Routing Event messages are the main source of message overhead during the intrusion detection. The message overhead formula for Routing Events RE in terms of average message size (in bytes) is \( M{O}_S=\frac{\frac{{\displaystyle \sum \left| RE\right|}}{N}}{\frac{{\displaystyle \sum \left| OGM\right|}}{n}} \), which is the ratio of message overhead for the average RE message size in relation to OGM packet size, where N = ∑ RE and n = ∑ OGM. The message overhead formula for RE in terms of message frequency is \( M{O}_F=\frac{{\displaystyle \sum RE}}{{\displaystyle \sum OGM}} \), which is the ratio of message overhead for RE exchanged by the node compared to OGM packets transmitted by the node during the experiment.

For node O 1, we found \( M{O}_S^{O_1}=9.1\% \), and for node O 2: \( M{O}_S^{O_2}=8.9\% \). We see that \( M{O}_S^{O_1}>M{O}_S^{O_2} \) since average RE message size: \( \frac{{\displaystyle \sum \left| RE\right|}}{N} \) is smaller for node O 2 as O 2 has less neighbors than node O 1, then O 2 process less RE messages. We find that for both nodes the calculated MO S is fair considering that RE are transmitted using TCP protocol, which adds considerable packet overhead compared to BATMAN ethernet frames on Layer 2. The average size of a RE TCP packet is 271 bytes while the size of an OGM packet is only 44 bytes, then the RE message overhead is considered reasonable. The reason for MO S behavior is that a unique TCP packet transports various RE messages as the effect of analysis of several OGMs packets.

Concerning MO F , we estimated \( M{O}_F^{O_1}=8\% \) and \( M{O}_F^{O_2}=6\% \). For this message overhead, we also note that \( M{O}_F^{O_1}>M{O}_F^{O_2} \) because the number of RE is higher for O 1, as said before, attributable to the higher traffic processed by O 1. The calculated MO F , with respect to the number of packets, is low considering that ACK packets are also taken into consideration for the overhead calculation. The reason for the behavior of MO F is the same: various RE messages are aggregated into a single TCP packet then reducing the number of TCP packets exchanged between the nodes.

However, if we calculate the ratio of message overhead for the total RE message size: \( M{O}_T=\frac{{\displaystyle \sum \left| RE\right|}}{{\displaystyle \sum \left| OGM\right|}} \), we find \( M{O}_T^{O_1}=73.1\% \) and \( M{O}_T^{O_2}=53.1\% \), which is considerably high compared to OGM packet overhead, since a priori, as explained before, RE messages are essentially exchanged using TPC packets which incurs significant message overhead with respect to the OGM packets.

The average transmission rate of RE messages for O 1 is 0.939 Mbit/sec, and the average transmission rate of OGMs is 0.012 Mbit/sec which is 1.4 % of the RE transmission rate. Then, the total average bandwidth overhead is 0.952 Mbit/sec, which is appropriate for a WMN where common mesh routers are equipped with wireless cards having potential transfer speed of up to 108 Mbps. Thus, the network communication could afford this kind of communication overhead imposed by the CIDS communication mechanism along with BATMAN routing protocol. In that case, the distributed CIDS approach is scalable in the matter of communication overhead.

5 Related work

In this section, we review related cooperative IDSs and CIDSs architectures for WMNs and compare our work with these approaches. Since nodes in WMN have partial view of the network data, a cooperative IDS architecture is preferable than stand-alone architecture to provide sufficient traffic information to each node’s detection engine in order to reach the correct decision about the detected intrusions and misbehaving nodes.

Zhang et al. [17] proposed one of the earliest cooperative intrusion detection architecture for ad-hoc networks composed of six modules. A local data collection module gathers audit data, which is analyzed by the local detection engine module. If anomaly is detected with strong evidence, the IDS initiates a response action by triggering the local response module, which raise an alert, or the global response module. If anomaly is detected with inconclusive evidence, the IDS requests the cooperation of neighboring IDSs for global detection using the cooperative detection module, which communicates to the other IDSs through a secure communication module. Nonetheless, the authors do not present details of the cooperative detection algorithm, i.e., which type of network data is exchanged. In addition, they only evaluate the local detection engine module for detecting intrusions locally.

Marti et al. [32] proposed watchdogs and pathraters. A watchdog runs on each node to monitor the behavior of neighboring nodes by promiscuously listening the neighbors’ transmissions. When a node forwards a packet, then the node’s watchdog verifies that the next node in the path also forwards the packet. The pathrater mechanism avoids routing packets through detected misbehaving nodes. But the approach has some limitations, e.g., since the watchdog must have knowledge of the route then it is limited to source routing protocols. In addition, the watchdog cannot detect a misbehaving node in the presence of packet collisions, neighbors with weak signal, and collusion between malicious nodes. The intrusion detection approach proposed by Marti et al. cannot detect the presence of partial packet dropping attacks, where the attacker drop packets below the threshold established.

Yang et al. [33] introduced SCAN, a network layer IDS for detecting misbehaving nodes. In SCAN, each node monitors the routing and packet forwarding behavior of its neighbors in promiscuous mode as in watchdogs [32], and detects any misbehavior in the neighborhood. A consensus scheme is implemented, in which a node is convicted of being malicious only after multiple neighbors have reached such conclusion. SCAN has the same disadvantages as Marti’s watchdogs since it is limited to AODV protocol that is a source routing protocol. In SCAN, the detection of routing misbehavior is only possible by adding new fields to AODV RREQ and RREP routing messages so each node can maintain part of the routing table information from its neighbors. In addition, SCAN cannot detect packet forwarding misbehavior in the presence of neighbors with weak signal transmission. Moreover, SCAN cannot distinguish packet drop due to collisions from intentional packet dropping. Another limitation is the need for multiple neighboring nodes to collaborate in order to make the final decision about the suspicious node.

Komninos and Douligeris [34] proposed a layered intrusion detection framework (LIDF) to detect compromised nodes. LIDF consists of collection, detection, and alert modules, which operates locally in every node. The collection module collects audit data at OSI link layer for checking one-hop connectivity and frame transmission, and at network layer for checking routing and data packet forwarding. The detection module processes the collected audit data to perform anomaly-based detection. The alert module is responsible for notifying the neighbors and request additional audit data to reach a more precise decision regarding the suspicious node. However, the approach does not specify in which attack scenarios the cooperative detection scheme is triggered. In addition, the cooperative scheme proposed by the authors requires that each node have at least three neighbors for functioning in reliable way, but it is vulnerable to blackmail attacks. The authors claim the approach is able to detect malicious packet dropping, but details of packet dropping detection mechanism are not given neither experimental evaluation is presented.

Recently, Saxena et al. [35] propose a hierarchical architecture for detecting selfish behavior in WMNs by dividing the mesh routers into clusters. Monitoring Agents (MAs) installed on specific mesh routers watch the behavior of neighboring routers in the cluster by collecting periodic traffic reports and sending them to Sink Agents (SAs), hosted at the mesh gateways. SAs process the reports and detect the presence of selfish mesh routers. The approach differentiates between intentional dropping and packet drop due to poor link quality in order to decrease the number of false positives. However, the constant transmission of traffic reports from mesh routers to MAs and from MAs to SAs incurs extra communication overhead to these nodes. In addition, mesh routers elected as cluster-heads and mesh gateways are unfairly overloaded, which imposes a high processing cost since these nodes must frequently process traffic reports, and they are vulnerable to man in the middle attack and blackmail attacks. Moreover, cluster-heads may become critical point of failures in the IDS architecture.

A key point in the existing cooperative and distributed IDSs approaches is that they have been exclusively validated using simulation-based studies or theoretical models, then lacking experience in real protocol implementations for WMNs. Simulation is useful to estimate the performance of intrusion detection engines, which are complex to implement as IDS tools and difficult to deploy at nodes in the ad-hoc network infrastructure. However, simulation cannot fully represent the implementation details and performance requirements of the CIDS capturing network traffic, analyzing the collected data, and exchanging intrusion detection information with neighboring nodes, all those tasks executed at real-time in a mesh device with constrained hardware resources. As a result, these approaches can be shown to be unpractical in real WMN scenarios.

These approaches are very promising but they do not demonstrate the viability of the cooperative detection scheme where neighbors are supposed to share audit data and detection results in order to give a consensual verdict. In addition, the existing solutions are not well suited for WMNs as they assume the existence of a reliable communication channel between the nodes, which is not a valid assumption for wireless scenarios because of interferences and periods of link congestion. Except the recent work [35] that takes into consideration the effect of link quality on the detection engine performance. For that reason, we decided to implement our approach on a Virtualized Mesh Network platform which allows representing real link limitations, such as packet loss rate and lost burstiness. The VMs own hardware configuration that is equivalent to a mesh device executing the distributed CIDS approach. This virtualized environment allows demonstrating how the cooperative CIDS architecture can be applied to detect routing attacks and malicious nodes in more realistic WMN scenarios in terms of implementation details and performance requirements for the IDS tool.

6 Conclusion

In this work, we presented a distributed intrusion detection methodology for WMNs which is integrated to the popular IDS tool Bro used in production environment. The CIDS is responsible for monitoring the traffic at the node and exchanging Routing Events with neighboring nodes in order to identify possible routing misbehavior in the vicinity. We employ a monitoring based detection approach which precisely detects routing attacks that violate the routing functionality of nodes, for instance, in case a malicious node is diffusing incorrect routing packets through the network. The CIDS monitors the routing activity of the neighborhood over a period of time and set thresholds to distinguish malicious routing behavior from communication failures, i.e., the false positives. Then, we propose a consensus scheme that permits a group of nodes to be flexible enough to make a decision when it is convenient, e.g., the majority agrees upon a verdict, while still adopting the main concepts of consensus decision-making process. The intrusion detection approach provides rapid and efficient malicious traffic detection for collaborative recognition of routing attacks in a neighborhood. It should be noted that the approach is adaptable to most routing protocols and can be applied to detect a wide range of attacks.

The distributed intrusion detection solution has been validated by implementation in a virtualized mesh network environment. Experimental results show that the intrusion detection scheme has small false positive rate and false negative rate, and high detection rate. In addition, the attack detection algorithms take into account the constrained hardware resources of the node such as CPU and memory, then appropriately fitting in mesh devices owning restricted computational resources. Despite the approach presents a good efficiency, the message overhead introduced to the communication system is not the minimum possible overhead for the communication system.

As a future work, we envisage reducing the communication overhead of the CIDS communication mechanism without affecting the accuracy of the distributed CIDS scheme, i.e., routing attacks should be detected with equal precision respecting the principles of distributiveness and cooperativeness introduced by the approach. A solution is to keep of list of trusted nodes and watches the traffic of suspected nodes instead of monitoring all the nodes in the neighborhood. A second alternative is to monitor a random choice of neighboring nodes at regular intervals. The use of UDP protocol in place of TPC protocol for the CIDS communication would likewise reduce the message overhead. Furthermore, we will implement mobility models in the virtualized mesh network platform, for instance, based on arbitrary movement of nodes such as Random Waypoint model, to evaluate the CIDS approach efficiency in scenarios with roaming nodes.