Keywords

1 Introduction

Intrusion Detection Systems plays a key role in security of modern day software applications/systems. They compare observable behaviour in the system against suspicious patterns to identify any kind of intrusions. There are two variances of IDS: Network based (NIDS) or Host based (HIDS). Traditional IDSs have a problem that they work in isolation and therefore have higher chance of getting compromised by unknown or new threats. A Collaborative IDS solves this problem by having peer IDS help each other out and get aided by shared collective knowledge and experience from peers. This increases both the accuracy and the ability to detect new intrusion threats. Collaborative IDS assumes that all IDSs will honestly cooperate. The lack of trust management leaves the system vulnerable to malicious peers [1].

Few IDSs have been produced to cooperate honestly based on trust and/or distributed trust models but they have not incorporated any kind of incentives for IDS collaboration. Incentives are important criteria any collaborative system otherwise it will suffer from “free rider problem” in which certain group of IDSs only keep on asking for assistance but may not actually contribute to the system. Thus, this leads to degradation of performance of the system. So we need to take care of incentives while designing such a system [2]. The distributed collaborative is preferred over centralized as it need rely on a central server to gather and analyze alerts and thus avoiding any bottleneck problems [3]. We just need to take care of any malicious or malfunctioning IDS which can degrade the performance of the system (Fig. 1).

Fig. 1
figure 1

Overview of system

In this paper, we propose a trust-based IDS collaboration network system with incentive based scheme for resource allocation. In this the amount of resources allocated by each IDSs to their neighbors is dependent on the trustworthiness and resources already allocated by its neighbors to help it.

An optimization problem is constructed to aid the IDS to optimally allocate resource to maximize the satisfaction level of each of its peers. We show that under certain controllable system conditions, there exists a unique Nash Equilibrium. We use a Bayesian trust management model based on Dirichlet family of probability density functions to predict the likely future behaviour of an IDS based on its past records.

The uncertainty associated with IDS gets pacified due to this model. Acquaintance is managed between the IDSs using the estimated trust values to enhance the accuracy of the system. Dirichlet function with game theoretic approaches makes the system even more robust and efficient. We now try to introduce the concept of reinforcement learning to make the system more secure and robust. Reinforcement learning takes environment into account to maximise the overall satisfaction levels. Markov decision process (MDP) are generally used to formulate the environment. Since Nash Equilibrium technique is limited for a single attack scenario we will be considering each attack as a separate state and model it as a MDP for effective detection of Intrusion in our network.

Section 2 tells us about the related work which have been done in this field. Section 3 discusses about the motivation to carry out this project. Section 4 describes the Architectural Overlay of the system. Section 5 discusses our proposed work in detail. Section 6 gives us an analysis of why the proposed work is better than any other current methods. Section 7 gives us the conclusion and the plans for future work.

2 Related Work

In one of the work presented by Zhu and Tamer [4], they propose a trust management system which involves IDSs to build trust rapport by exchanging test messages. In this each IDS sends a trace of possible attacks from its own database (one with risk level of each attack), to its acquaintances to test their trustworthiness. Each of these peers then sends back the risk values of each of those attack in trace. The sender then cross-checks these values with its own knowledge database and generates a satisfaction level for each feedback using “a satisfaction mapping function”. In [5], we see the use of a simple weighted average model to estimate the trust value while in [6] Bayesian statistics model is used to calculate the trust value.

In Bartos et al. [7], a distributed model is presented which helps to collaborate multiple ids sensors which are heterogeneous in nature. This model assumes to be able to monitor ids in different locations using multiple detection sensors. It uses game theoretic approaches in dynamic environments to optimize behavior among the sensors. They propose a hand between defenders and attackers as a model along with a trust based model e-fire to collaborate in such a highly dynamic environment while trying to prevent any kind of poisoning or manipulation by malicious attackers.

In Fung et al. [3], they propose a Trust management using Dirichlet distribution, to calculate trust based on mutual experience between ids. An acquaintance algorithm is proposes to manage its peers based on the trust value. Intrusion detection system is like a two player game between the attacker and the intrusion detection system as two opposing players. In Alpcan and Basar [8], a two person, finite game has been portrayed between each sensor of the ids based on cooperative game theory. In papers [911], we see the use of non-cooperative game frameworks in intrusion detection system. In Liu and Comaniciu [12], we can see the use of Bayesian game techniques in intrusion detection for ad hoc networks, using a two-person non-zero-sum incomplete information game as a framework for the system.

In Zhu et al. [2], they show that there exists a Nash equilibrium state and its unique, amongst the peers so they can communicate in cooperative manner. They were able to develop an algorithm which is iterative and converges geometrically to the equilibrium. In Allazawe et al. [13], we get an overview of traditional IDS and its challenges and how game theoretic approaches help and its limitations.

In Lye and Wing [14], they show the use of game theory in the field of security of computer networks. They construct a two-player stochastic game model to visualize the interactions between attacker and administrator. They use non-linear program to evaluate the Nash equilibrium state or the best possible strategies for both players taken into account. These strategies can then be used by administrators to improve the security of the system.

In Alpcan and Tamer [15], A game theoretic based model is made for the sensors observing and reporting attacks to the IDS as a finite Markov chain. Therefore a two player stochastic Markov game is observed depending on the information on the players. It captures various intricacies of the system. Both MDP and Q-learning methods are used to build the foundation for development of various strategies for the players. As we can see trust management and game theoretic approaches have not been used in together as a collaborative system to enhance the robustness of intrusion detection system as a whole.

3 Motivation

The most important flaw of Intrusion Detection Systems based out of game theory is that it assumes the adversary’s rationality. It assumes that it would take steps to obtain a maximum gain (or near maximum) for itself or its interests. However, in reality, this may not always be the case as human behavior is still unpredictable. So it is hard to discern whether the users attack is rational or whether he/she is simply trying to confuse the IDS by employing another attack vector. Another area of weakness of the game theoretic approach to intrusion detection is the unsolved approach on how to detect and handle simultaneous attacks [14].

Most of the systems available are either optimized for Collaborative Intrusion Detection Networks [CIDN] (group of Intrusion Detection System (IDS)) or for a single IDS. There is no generalized system available for both CIDN and IDS.

4 Architecture

Our architecture can be quite similar to Fig. 2 and also to Fig. 3. The consolidated architecture can be described as network of different IDS. The system will have a group of IDS’s who are connected to each other via Internet. Each IDS will in turn consist of group of computers which are mostly connected via LAN. So in short, there will be communication between clients in a network forming a IDS as well as multiple IDS communicating together.

Fig. 2
figure 2

Individual IDS

Fig. 3
figure 3

Architecture overlay

There will be 2 types of messages which are being transferred through our system:

  1. 1.

    Local Messages

  2. 2.

    Global Messages

Local Messages are the messages which will be transferred in between a given IDS (LAN connected) and Global messages will be transferred in between different IDS. The messages which are transferred is explained in depth in the next section. Our system can be a mixed of Anomaly detection system and Misuse detection system.

5 Proposed Work

5.1 Dirichlet Based Collaborative Intrusion Detection System

We are connecting Intrusion Detection Systems to form a collaborative network. Here each IDS can choose its collaborative peers. The supporting peers will have varied expertise levels of detecting the intrusions. Our system has following features:

  • An algorithm for intrusion detection systems which is effective, fair and incentivisable which helps in managing their acquaintances from which they can ask opinions about intrusions;

  • A trust management model to reduce the negative impact of low expertise IDSes, dishonest IDSes and discover compromised ones

  • Detection of Malicious insider activity

  • System should be scalable and highly elastic in terms of trust evaluation, network size, and assessing the intrusion.

Links between IDSes indicate their collaborative relationships. Each node maintains a list of acquaintances whom it trusts the most and collaborates with. Nodes communicate by means of intrusion evaluation requests and corresponding feedback. There are two types of requests:

Intrusion Consultation and Evaluation Requests Whenever a suspicious behavior is detected by IDS and the expertise level still remains insufficient to make a decision, it sends other requests to the other friend IDSes for consultation. Feedback from the friend IDS’s is cumulated and aggregated. A final conclusion is developed using those feedbacks. The information provided to friend IDS depends on the trust level of each friend.

Fake Test Messages Nodes in the Collaborative Intrusion Detection Network use these fake test messages for finding out the trust of IDS. These messages are “fake” consultation requests, which are formulated in a manner that makes them difficult to be distinguished from real consultation message requests [3].

The content of these messages depends on various factors. Some of them can be:

  1. (1)

    Type of Attack

  2. (2)

    Number of Peers

  3. (3)

    Topology of Network etc.

A behavioral graph is made for describing the activities. The testing node has prior information about the true result of the fake testing message. It then uses the received feedback to derive at a trust value for the other nodes in the network. This can be done using standard machine learning and regression techniques. This will help us in identifying malicious nodes. IDSes use different metrics to rank and rate alerts. Let’s assume there exists a function H, which maps Intrusion Detection System’s alert ranking to a [0, 1] interval where 0 denotes minimal level of traffic and 1 highly dangerous intrusions. H follows more severe than partial order relationship which means if an alert aj is more severe than alert ai then H preserves that relationship by having H(aj) > H(ai). The satisfaction of feedback is find out using these factors:

  1. 1.

    The received answer (a ∈ [0, 1]),

  2. 2.

    The difficulty level of the test message (d ∈ [0, 1]) and

  3. 3.

    The expected answer (r ∈ [0, 1]).

More the value of d, more difficult it would be to answer the request correctly. The difficulty of a test message can be determined by the age of signatures. The difficulty level is low for the messages generated from old signature; medium difficult for messages generated using new signature and high difficult for malicious traffic taken from previous attacks.

We can measure the quality of feedback using a function Sat(r, a, d) (∈ [0, 1]) for representing the level of satisfaction in the received answer which depends on the distance from the expected answer and difficulty of the message. It can be written as follows:

$$Sat(r,a,d) = \left\{ {\begin{array}{*{20}c} {1 - (\frac{a - r}{{\hbox{max} ({\text{c}}_{ 1} r , 1- r)}})^{{d/c_{2} }} } & {a > r} \\ {1 - (\frac{{{\text{c}}_{ 1} (r - a)}}{{\hbox{max} ({\text{c}}_{ 1} r , 1- r)}})^{{d/c_{2} }} } & {a \le r} \\ \end{array} } \right.$$
(1)

where c1 is used penalizing the wrong estimates. c1 value is always >1 to show that the estimates which are lower than exact answer are penalized strongly than those that are higher. c2 ∈ R+ controls the satisfaction sensitivity, where large values means more sensitive to the distance between the correct and received answers. This equation makes sure that levels with low difficulty are more severe in their penalty than incorrect answers [3].

Our interest lies in finding out the distribution of the satisfaction levels in the answers provided by each peer IDS. We use this information for estimating the satisfaction level. We use a beta distribution with binary satisfaction level (satisfied, —satisfied). Here we are using Dirichlet distribution [4] for solving our problem which involves multi-valued satisfaction levels. This kind of distribution is most suited for our trust management model since the trust is updated based on the history of interactions.

We consider X as a discrete random variable which signifies the satisfaction level peer’s feedback. X is chosen such that X = {x 1 ,x 2, … ,x k } (x i  ∈ [0, 1], x i+1 > x i ) are the different levels of satisfaction. The trustworthiness of a given IDS or peer would be an input to our Game theoretic model for IDS.

$$T^{uv} = E[Y^{uv} ] = \sum\limits_{i = 1}^{k} {w_{i} E[p_{i}^{uv} ] = \frac{1}{{\gamma_{0}^{uv} }}} \sum\limits_{i = 1}^{k} {w_{i} \gamma_{i}^{uv} }$$
(2)

where,

  • p uvi , denotes the probability that peer v provides answers to the requests sent by peer u with satisfaction level xi.

  • γ uv i is the cumulated evidence that v has replied to u with satisfaction level x i .

  • wi, an associated weight of each satisfaction level xi

  • Yuv be the random variable denoting the weighted average of the probability of each satisfaction level in puv.

5.2 Game Theoretical Model for Intrusion Detection System

Prevention of intrusion is just set of interaction between the IDS which protects a target system (TS) and the user. This situation can be studied in detail using Game Theory. A game (stage game) is constructed to model the interaction between user and IDS. It can be considered as an infinitely repeated game. The solutions to the stage game and to the repeated game are then given and interpreted. Using this model we can predict user intentions and preconditions for an attack and thus we can prevent any insider intrusions and outsider attacks.

We model the interactions using a two player stochastic game between the user and the IDS. A N-node intrusion detection network is considered in our model. Let the n nodes be denoted by N = {1, 2 ,…, N}. N du will represent the set of neighbors for peer u with maximum distance d ∈ R+, i.e., N du  = {i ∈ N: dist(i, u) <= d, i <= u}, where dist: N N –> R+ is a distance function measuring the distance between two nodes. N1 u will be same as N-u because it will contain all nodes except u. There is symmetric information flow in the network. R vu represents the set of resources demanded by v from u for its full satisfaction. Whereas the minimum acceptable set from u to v is represented by mv u. Let p uv ∈ R+ be the set which u actually allocates to v. This parameter is decided by u and is private to u and v. Therefore, to satisfy v this should lie over the interval [m vu , r vu ].

We assume that we are aware of trust values between each peer and its neighbors, as its a distributed trust management system. Let Tuv ∈ [0, 1] represent this trust value from u to v. puv parameter is dependent on this trust value Tuv perceived by u. Each of the peer will try to maximize its effort in order to satisfy its neighbors but having a capacity constraint Cu, determined by its own resource capacity such as bandwidth, CPU, memory, etc. Following relation will hold true:

$$\sum\limits_{{v \in {\mathcal{N}}_{u}^{d} }} {p_{uv} { \leqslant}C_{u}, \quad {\text{for}}\,{\text{all }}u \in {\mathcal{N}}} .$$
(3)

In this model, an utility function NSuv is defined to model the satisfaction level from a peer to its neighbors. It is defined as follows:

$$S_{uv} = \frac{{{\text{ln(}}\alpha \frac{{p_{uv} - m_{vu} }}{{r_{vu} m_{vu} }}{ + 1)}}}{{{\text{ln(}}\alpha { + 1)}}}$$
(4)

where, a ∈ (0, 1), a system parameter to control satisfaction curve, ln(a + 1), the normalization factor, ln chosen because of its property of proportional fairness. We will set the net satisfaction S as Trust (which we got from Dirichlet Solution) multiplied by Satisfaction level of neighbors.

$${\text{S}}_{\text{uv}} = {\text{ NS}}_{\text{uv}} \times {\text{ T}}^{\text{uv}}$$
(5)

Let Uu: R L(u,d)+  → R+ be the peer u’s aggregated altruistic utility, where L(u, d) = card(N du ), the cardinality of the set N du . Let the payoff function, Uu, for u be given by:

$$U_{u} = \sum\limits_{{v \in {\mathcal{N}}_{u}^{d} }} {w_{uv} S_{uv} w_{uv} = T_{v}^{u} p_{vu} }$$
(6)

where, wuv, weight given on v’s satisfaction level Suv. More the trust, more is the weight. In this model, every peer u ϵ N tries to maximize Uu within its resource capacity. To find optimum value, following function can be devised:

$$\begin{array}{*{20}c} {\max_{{\{ p_{uv} ,v \in {\mathcal{N}}_{u}^{d} \} }} } & {\sum\limits_{{v \in {\mathcal{N}}_{u}^{d} }} {w_{uv} S_{uv} } } \\ {{\text{s}}.{\text{t}} .} & {\sum\limits_{{v \in \, {\mathcal{N}}_{u}^{d} }} {p_{uv} \, { \leqslant} \, C_{u} } } \\ {} & {m_{vu} \, { \leqslant} \, p_{uv} \, { \leqslant} \, r_{uv} \forall v \in {\mathcal{N}}_{u}^{d} } \\ \end{array}$$
(7)

where, Suv and wuv have been previously defined in Eqs. 2 and 3 respectively. As we observed earlier every peer needs to find an optimal value and thus an optimization problem (OP) to solve. This problem is a concave one in which the function is a concave function in puv constrained by the cardinality of the set N du . We assume the size of whole network is large and peers within a radius d can only communicate with each other and thus N independent optimization problems are there. Therefore game can be modeled by triplet (N, Au, Uu), where Au is action set for peer u for u ∈ N, Uu is the payoff function defined in Eq. 3 and N is the size of network [2]. Action here means allocation of resources. Action set, Au = {pu ∈ RL(u,d)+ − ∑ v ∈ N du puv <= Cu} ∩ {pu ∈ RL(u,d)+ − m vu  <= puv <= rvu, v ∈ N du }. Following condition shows that action set is non-empty.

$$C_{u} {\geqslant} \sum\limits_{{v \in {\mathcal{N}}_{u}^{d} }} {m_{vu} }$$
(8)

Lagrange relaxation can be used to solve for Nash equilibrium. Lagrangian of peer u’s optimization problem based on three lagrangian multipliers can be used to devise a relaxed game model [2]. Using this relaxed model we can try to solve for Nash Equilibrium. First order KKT condition can then be applied to this optimization problem.

Since our system is comprised by taking the positive effects of Game Theory and Trust Management models, it should theoretically perform better than the available Intrusion Detection Systems. We can also refer from the results of Zhu et al. [1] and Fung et al. [13] for more introspection on the implementation details.

5.3 Extended Game Theoretical Model for Intrusion Detection System

The model proposed using game theoretic and trust management models can further be decomposed into smaller manageable components so we can figure out strategies individually for each sub model using Markov decision process along with game theoretic methods. Subgames can be defined using nearly isolated clustered state and MDP can be defined for states which has meaningful actions for only one player. Using the strategies from each sub model or components we can find the overall best-response for each player. The computation time is said to be reduced significantly using a such a decomposition method [1]. Even different attacks can be detected using Markov decision process.

We propose to use reinforcement learning to enhance the game theoretic model for IDS. Markov decision process is being used to solve reinforcement learning. In MDP, if the current state is some state s, an action available to s is then chosen by a decision maker a. The process randomly moves to a new state s′ with corresponding state transition function Pa (s, s′) and reward Ra (s, s′). Therefore, s′, the next state depends on decision maker a and the current state s. The state transitions satisfies the Markov property [2], which states that “effects of an action taken in a state depend only on that state and not on the prior history”.

The MDP for our system is defined using filtering variables:

  • S = State corresponding to one attack vector

  • A = Actions which are either reporting attack or remaining silent

  • P = Probability of transition is specific to the current state that is the attack vector.

  • R = Reward is directly proportional to the trustworthiness factor taken from Dirichlet distribution

  • γ = this depends on the Nash equilibrium stabilization time.

One of the most critical part in any MDP is the decision maker’s policy [16]. This policy is defined by a function ∏ that allots an action ∏(s) to a state s. Policy function ∏ should be chosen in such a way to maximize sum cumulative function of the rewards. We need to take care of following criteria:

  • The trustworthiness of the IDS

  • The Nash equilibrium state

  • Different attack attributes

  • Rewards for moving from one state to another

Using the knowledge of the reward function R and the state transition function P, we try to devise a policy to maximize the discounted rewards.

Calculation of optimal policy can be done using standard set of algorithms which requires two arrays indexed by state: V, set of real values; ∏, the set of actions. The algorithm results to give ∏ the solution and V(s), the discounted sum of the rewards that can be gained by following that solution from state s [17].

The algorithm is a two-step one, these two steps are repeated for all states until the result becomes constant and no further changes are observed. They are defined as follows:

$$\pi (s): = \arg \mathop {\hbox{max} }\limits_{a} \left\{{\sum\limits_{{s^{{\prime}}}} {P_{a} (s,s^{{\prime}})(R_{a} (s,s^{{\prime}}) + {\upgamma}V\text{(}s^{{\prime}} \text{))}}} \right\}$$
(9)
$$V(s): = \sum\limits_{{s^{{\prime}}}} {P_{\pi} (s,s^{{\prime}})(R_{\pi} (s,s^{{\prime}}) + {\upgamma}V\text{(}s^{{\prime}} \text{))}}$$
(10)

These steps can be done state by state or even all states at once or maybe even specifically more to certain states. The algorithm will converge to a optimal solution unless certain set is excluded all together. To resolve this a further function is defined which corresponds taking the action a and then continuing optimally:

$$Q(s,a) = \sum\limits_{{s^{\prime } }} {P_{a} (s,s^{\prime } )(R_{a} (s,s^{\prime } ) + \gamma (s^{\prime } ))}$$
(11)

This above function is an unknown one but it learns based on (s, a) pairs with their outcomes s′. Thus, there is an array Q and “earns” to update it. This is known as Q-learning. Markov decision processes can be solved even without explicit specification of the transition probabilities using reinforcement learning. In this a simulator aids in accessing transition probabilities, which is generally restarted many times from a uniformly random initial state. As previously mentioned we propose a model by breaking it down to smaller sub module and solve it using MDP to find strategies for the game between the attacker and the IDS. Reinforcement learning enhances the MDP and thus helps in solving for an optimal state (Nash equilibrium). Thus, here we are able to extend upon the game theoretic approaches previously along with a Dirichlet based management system to make the system even more robust.

6 Analysis

We proposed two new models for Intrusion Detection Network and analyzed the previous available models.

Dirichlet Based Collaborative Intrusion Detection System is based using Dirichlet distribution as a trust distributing mechanism. Two types of requests are sent: Intrusion Consultation and Evaluation Requests and Fake Test Message request. A satisfaction function is formed using received feedback, expected answer and difficulty level of request. This helps in formulating a trust value for each peer. This method is generally fast but the accuracy is less than other systems.

Game Theoretical Model for Intrusion Detection System is made using the concept of Nash Equilibrium. Lagrange Multipliers were used to find the Nash Equilibrium state. The optimal strategy involves considering both the gameplay of individual player. This model can only be used when a single attacker is attacking the system. It also considers that the attacker always play using extreme rules that is he can’t change from an attacker to a normal peer.

Game Theoretic Dirichlet Based Collaborative Intrusion Detection System uses the concept of Dirichlet Distribution based trust model along with Nash Equilibrium. We use the trust value which we get from Dirichlet Based Message Request and use them for formulating the Nash Equilibrium’s Lagrange multipliers. This method has better accuracy than both the above methods and it takes into consideration that the attacker can change roles in between.

Extended Game Theoretic Dirichlet Based Collaborative Intrusion Detection System is basically the advanced version of the approach explained previously. It uses Markov Decision Process for modelling the Intrusion Detection system. Submodels are generated and reinforcement learning techniques are applied. This is the best model among all the proposed models. It even takes into consideration multi-attacker scenario. Time complexity of the system is high but it reduces the false positives which is a significant advantage. Due to the aforementioned features we can use it in high secure environments.

The summary of the analysis is done in following Table 1.

Table 1 Comparisons of different methodologies

7 Conclusion

In this paper, we present an extended game theoretic approach along with Dirichlet based trust management model for an intrusion detection system. IDS are an important part of any software system and so also a hard task to make it efficient and robust. We first start off with a trust management model for Collaborative Intrusion Detection System. This model is based on Dirichlet density functions and it takes care of evaluating uncertainty in estimating future behavior of peers in IDS or between IDSes in an IDN. Measurement of this uncertainty aids to deploy an adaptive message exchange rate which then helps in making the system scalable. Along with the “forgetting factor”, it is robust against some common attacks and threats. We then next moved on to game theoretic approaches and showed the existence of Nash equilibrium in the system and its uniqueness. This takes care of the free rider problem in the IDS. Then finally we move on to MDP and reinforcement learning which helps in breaking the model into smaller sub models to find the optimal strategy for the game. All these techniques and methodologies when put together gives us a secure and robust Collaborative Intrusion Detection System which enhances the security of the network and the overall system.