Keywords

1 Introduction

Fast development of information and communication technologies opens new perspectives for creating Wireless Sensor Network (WSN)-based intelligent services oriented on collecting, sending and processing large amount of data. This idea is shortly termed as Ambient Intelligence and Internet of Things and is based in particular on different applications of Wireless Sensor Networks (WSNs). WSNs are networks of large number of tiny computer-communication devices called sensors deployed in some area, which sense a local environment, collect local data depending on an application and send them via a special node called a sink to an external world for processing and taking a decision.

In many applications, such as e.g., monitoring remote and difficult to access areas, sensors are equipped with single use batteries which can not be recharged. From the point of view of Quality of Service (QoS) of such a WSN, one of the most important issues is its operational lifetime. After a deployment (e.g., by an aircraft) of sensors at random locations of some area they should self-organize: to recognize their nearest neighbors to be able to communicate and start taking locally decisions in subsequent moments of time about turning on or off their batteries to monitor events. These decisions will directly influence the lifetime of the network and should be taken in such a way as to maximize it. The problem of lifetime maximization is closely related to the coverage problem. A group of sensors monitoring some area is usually redundant, i.e., usually more than one sensor cover monitored targets and forms of redundancy can be different. By solving the coverage problem one can indirectly also solve the problem of maximization of WSN lifetime.

There exists a number of algorithms to solve the problem of coverage/lifetime maximization. They are classified either as centralized and assume availability of entire information and a solution is delivered usually in the form of a schedule of activities of all sensors during the entire lifetime, or distributed, where a solution is found on the basis of only partial information about the network. Because these problems are known as NP-complete [4] centralized algorithms are oriented either on delivery of exact solutions for specific cases (see, e.g. [3]) or applying heuristics or metaheuristics to find approximate solutions (see, e.g. [7, 14]). The main drawback of centralized algorithms is that a schedule of sensors’ activities must be found outside the network and delivered to it before starting operation. Therefore distributed algorithms become more and more popular because they assume reactivity of sensors in real time, and they are scalable in contrast to centralized algorithms.

A number of such algorithms based on applying Learning Automata (LA) [6, 11] or Cellular Automata (CA) [13] has been proposed recently. Each of these techniques taken separately has own advantages and disadvantages. The main disadvantage of classical CA is a lack of reactivity when they are applied to solve optimization problems. On the other hand, a distinctive feature of LA is the ability of interaction with an environment [1, 2]. We believe that combining both techniques is a rational approach in an attempt to solve optimization problems. Some works to extend classical CA to the second order CA which are able to self-adopt have been also appeared recently [5] and they are based on multi-agent game-theoretic paradigm and we follow these both lines of research.

In this paper we propose a novel approach to the problem of coverage/lifetime optimization based on multi-agent interpretation of the problem and game-theoretic interaction between players participating in a non-cooperative game [10]. Each agent-player is oriented on the minimization of its level of redundant coverage of monitored targets shared with other agent-players. The functions of agent-players are performed by deterministic LA. We show that the agent-players are able to find in a fully distributed way a solution defined as a Nash equilibrium (NE) [8] corresponding to balanced coverage of (POIs) which reduces batteries expenditures and prolongs the lifetime of WSNs.

The structure of the paper is the following. Section 2 describes the problem of coverage/lifetime optimization in WSNs and the next section presents a multi-agent interpretation of the problem. In Sect. 4 two game-theoretic models related to the studied problem are proposed, and in Sect. 5 these games are experimentally studied with the use of deterministic LA as players. The last section contains conclusions.

2 Sensor Networks and Coverage and Lifetime Problems

It is assumed that a sensor network \(S=\{s_1,s_2,...,s_N\}\) consisting of N sensors is deployed over some area, where M POIs should be monitored. Sensors are distributed randomly, each sensor can monitor POIs in a sensing range \(R_s\) and has a non-rechargeable battery of capacity b. Each sensor can work in one of two modes: an \({ active}\) mode when battery is turned on and a unit of its energy is consumed and POIs in its sensing range are monitored; and a \({ sleep}\) mode when battery is turned off and POIs in its sensing range are not monitored.

It is assumed that decisions about turning on/off batteries are taken in discrete moments of time t. It is also assumed that there exists some QoS measure evaluating the performance of WSN. As such a measure one can accept a value of coverage defined as a ratio of POIs covered by active sensors to whole number M of POIs. At a given moment of time this ratio should not be lower than some predefined value of q \((0<q \le 1)\). Lifetime of WSN can be defined as a number of consecutive time steps in which the coverage is within the predefined value of q.

Figure 1(a) shows an example of a sensor network consisting of \(N=4\) sensors. One can notice that if a given sensor is active and some other neighbor sensors are also active than a number of POIs in the sensing ranges of these sensors are covered by more than one sensor. This possibly redundant coverage is related to extra use of sensors’ batteries which has a negative impact on the lifetime of WSN. Figure 1(b) shows a graph of interaction depicting relations between sensors and POIs of exemplary WSN from Fig. 1(a).

Fig. 1.
figure 1

Example of a sensor network: area view (a), corresponding interaction graph (b).

One can notice that the graph has two types of vertices: black square vertices denote sensors and rectangle vertices denote POIs. A sensor \(s_i\) in an active mode covers \(m_i\) POIs which can be classified in the following way: POIs which can be covered only by sensor \(s_i\) (\(m_{i0}\) is a number of such POIs), POIs which are shared by sensors \(s_i\) and \(s_j\) and can be covered by part or all these sensors (\(m_{ij}\)), POIs which are shared by sensors \(s_i\), \(s_j\) and \(s_k\) and can be covered by part or all these sensors (\(m_{ijk}\)), etc.

Sensors which share one or more types of POIs are immediate neighbors. One can see in Fig. 1(b) that e.g., sensors \(s_2\) and \(s_4\) are immediate neighbors because they share \(m_{24}\) POIs, and sensors \(s_1\), \(s_2\) and \(s_3\) are immediate neighbors because they share \(m_{123}\) POIs.

3 Multi-agent Approach to WSN Lifetime Optimization

Let us assume that each sensor \(s_i\) of WSN is controlled by an agent \(A_i\) of a multi-agent system consisting of N agents. Each agent has two alternative decisions (actions): \( \alpha _i = 0\) (battery is turned off) and \( \alpha _i = 1\) (battery is turned on) and neighbor relations between agents are defined by an interaction graph (see, Sect. 2). According to the interaction graph, an agent \(A_i\) has \(k_i\) immediate neighbors and will receive some reward \(rev_i()\) which depends on its decision and decisions of its neighbors (see, Eq. (1)):

$$\begin{aligned} rev_i(\alpha _{i},\alpha _{neigh_1}, \alpha _{neigh_2}, ...,\alpha _{neigh_{K_i}}) = \left\{ \begin{matrix} rev_i^{off} - pen_i^{off}(), &{} \text {if } \alpha _i = 0 &{} \\ rev_i^{on} - tax_i^{bat}(), &{} \text {if } \alpha _i = 1, &{} \end{matrix}\right. \end{aligned}$$
(1)

where:

  • \( \alpha _{i},\alpha _{neigh_1}, \alpha _{neigh_2}, \alpha _{neigh_{k_i}}\) – decisions of agent \(A_i\) and its neighbors;

  • \(neigh_{k_i}\) – a number of neighbors of sensor \(s_i\);

  • \(rev_i^{off}\) – a reward for covering by active neighbor sensors shared POIs while sensor \(s_i\) is inactive;

  • \(pen_i^{off}\) – a penalty for not covering POIs which are in the range of sensing of inactive sensor \(s_i\);

  • \(rev_i^{on}\) – a reward for covering by active sensor \(s_i\) POIs which are in its range;

  • \(tax_i^{bat}\) – tax for the use of battery by sensor \(s_i\).

More detailed formulation of Eq. 1 (see, Sect. 4) shows that an agent \(A_i\) can receive some reward even if it is inactive \((\alpha _{i}=0)\) and saves its own battery. It happens when some neighbor sensors are active and shared POIs are covered by them, and when a number of not covered POIs does not exceed some threshold value related to a predefined coverage parameter q and penalty for that is lower that obtained reward.

On the other hand, agent \(A_i\) receives a reward when it spends energy of its battery, but this reward can be lowered when some other neighbor sensors are active and cover shared POIs. The purpose of each agent is to maximize its total reward which corresponds to finding a local trade-off between requested level of the coverage and expending battery power. This way of behavior of agents is in line with the main goal of this work: finding a global trade-off between requested level of QoS and minimization of battery expenditure to maximize the lifetime of WSN.

There exists many ways to organize the work of agents to realize this global goal. In this paper we propose a game-theoretic approach based on non-cooperative games where agent-players compete for achieving their own goals and a solution of the optimization problem is converted into a problem of searching for Nash equilibrium (NE) by players in a game. Similar game-theoretic approach has been recently successfully applied in the context of solar-powered WSN [9].

4 Game-Theoretic Approach to WSN Lifetime Optimization

One of the main sources of imbalance between the level of coverage of POIs and spending battery power are shared POIs. In particular, one can see from Fig. 1(b) two extreme patterns of sharing: the same number of POIs can be shared by some (perhaps huge) number of sensors (see, \(m_{123}\)), and on the other hand – different pairs of sensors can share different sets of POIs (see, \(m_{12}, m_{13}, m_{23}\)).

These situations correspond to two different models with different expected solutions, and they are shown in Figs. 2(left) and 3(right), respectively together with corresponding interaction graphs.

Fig. 2.
figure 2

Model 1: a sensor network (left), corresponding graph interaction (right).

4.1 Model 1: Leader Election Game

In this model (see, Fig. 2) it is assumed that a number of N sensors controlled by corresponding agents share a common set of POIs. A reward obtained by a single agent \(rev_i(\alpha _{1}, \alpha _{2}, ..., \alpha _{i}, ..., \alpha _{N})\) depends on actions of all agents and can be evaluated according to Eq. (2):

$$\begin{aligned} rev_i(\alpha _{1}, \alpha _{2}, ..., \alpha _{i}, ..., \alpha _{N}) = \left\{ \begin{matrix} rev_i^{off}(m_i^{shared\_on}), &{} \text {if } \alpha _i = 0 &{} \\ rev_i^{on}(m_i), &{} \text {if } \alpha _i = 1, &{} \end{matrix}\right. \end{aligned}$$
(2)

where:

  • \(rev_i^{off}() = C_{rev}^{off} \times \frac{m_i^{shared\_on}}{M}\),

  • \(rev_i^{on}() = C_{rev}^{on} \times \frac{m_i}{M(N_{ij}^{on} +1)}=\frac{C_{rev}^{on}}{N_{ij}^{on} +1}\),

    where:

    • \(m_i^{shared\_on}\) – a number of POIs which are in sensing range of inactive sensor \(s_i\) and shared with active neighbor sensors;

    • \(m_i\) – a number of POIs which are in sensing range of sensor \(s_i\);

    • M – a total number of POIs;

    • \(N_{ij}^{on}\) – a number of active neighbors of sensor \(s_i\);

    • \(C_{rev}^{off}, C_{rev}^{on}\) – model constants.

In the NE of the game an expected rational behavior of players is such that only one agent-player selects action \(\alpha _{i} = 1\) and remaining players select actions \(\alpha _{j} = 0\) \((i \ne j)\). Therefore this game will be further referred to as the Leader Election Game. From definition of NE the following relations between payoffs of players selecting action \(\alpha _{i} = 1\) and players selecting action \(\alpha _{j} = 0\) \((i \ne j)\) should be fulfilled:

(3)

Thus, we obtain:

$$\begin{aligned} \begin{matrix} C_{rev}^{on}> C_{rev}^{off}, &{} \text {for } N_{ij}^{on} = 0,&{} \\ C_{rev}^{off}> \frac{C_{rev}^{on}}{N_{ij}^{on} +1}, &{} \text {for } N_{ij}^{on} > 0.&{} \end{matrix} \end{aligned}$$
(4)

Let us assume that:

$$\begin{aligned} a = C_{rev}^{off}, b = C_{rev}^{on}, c \le 0, \end{aligned}$$
(5)

so:

$$\begin{aligned} \begin{matrix} b> a, &{} \text {for } N_{ij}^{on} = 0,&{} \\ a> \frac{b}{N_{ij}^{on} +1},&{} \text {for } N_{ij}^{on} > 0.&{} \end{matrix} \end{aligned}$$
(6)

Finally we can construct for Model 1 the following (see, Table 1) payoff function \(u_i^1(\alpha _{1}, \alpha _{2}, ..., \alpha _{N})\) of the game:

Table 1. Payoff function \(u_i^1(\alpha _{1}, \alpha _{2}, ..., \alpha _{N})\) for \(i-th\) player.

We will accept the following values for parameters of the game: \(a = 1, b = 1.5\) and \(c = 0\). The table shows a payoff of \(i-th\) player selecting either the action “0” or the action “1” as a function of a number of remaining players selecting the action “1”. If \(i-th\) player selects “0” and 0 remaining players are “on” than the player receives the payoff equal to \(c=0\), while if at least one of remaining players is “on” the player receives the value of payoff equal to \(a=1\).

If \(i-th\) player selects “1” and all remaining players are “off” the player receives the payoff equal to \(b=1.5\), but if more remaining players are “on” he receives lower value of the payoff which depends on the number of players being “on”. Let us assume a two players (\(N = 2\)) game. The following action profiles exist in the game: (0, 0), (0, 1), (1, 0) and (1, 1). Let us consider the action profile (0, 1). Player 1 payoff \(u_1^1(0,1) = 1\) and player 2 payoff \(u_2^1(0,1) = 1.5\). If the player 1 changes its action it results in lowering its payoff to \(u_1^1(1,1) = 0.75\).

Similarly for the second player. It means that no player has a reason to change its action, and considered action profile is a NE point. This NE provides a perfect balance between coverage of POIs and spending battery power which maximizes lifetime of the considered network.

4.2 Model 2 - Synchronized Local Leader Election Game

In this model (see, Fig. 3) local sets of POIs are shared by neighbor sensors. The reward of agent \(A_i\) depends on its action \(\alpha _{i}\) and the actions of its two nearest neighbors \(rev_i(\alpha _{i\ominus 1}, \alpha _{i}, \alpha _{i\oplus 1})\) and can be calculated according to Eq. (7):

Fig. 3.
figure 3

Model 2: a sensor network (left), corresponding graph interaction (right).

$$\begin{aligned} rev_i(\alpha _{i\ominus 1}, \alpha _{i}, \alpha _{i\oplus 1}) = \left\{ \begin{matrix} rev_i^{off}() - pen_i^{off}(), &{} \text {if }\alpha _i = 0, &{} \\ rev_i^{on}(), &{} \text {if } \alpha _i = 1, &{} \end{matrix}\right. \end{aligned}$$
(7)

where:

  • \(rev_i^{off}() = C_{rev}^{off} \times \frac{\sum _j m_{ij}^{shared\_on}}{M}\),

  • \(\begin{aligned} pen_i^{off}() = {\left\{ \begin{array}{ll} C_{pen}^{off} \times \frac{m_i - \sum _j m_{ij}^{shared\_on}}{M}, &{} \text {if } (m_i - \sum _j m_{ij}^{shared\_on}) \ge m_i(1-q), \\ 0, &{} \text {otherwise,} \end{array}\right. } \end{aligned} \)

  • \(rev_i^{on}() = C_{rev}^{on} \times ( \frac{m_i - \sum _j m_{ij}^{shared\_on}}{M} + \frac{\sum _j (m_{ij}^{shared\_on}/(N_{ij}^{on} +1 )}{M}), \)

    where:

    • \(rev_i^{off}()\) – a reward of inactive agent \(A_i\) for covering shared POIs by active neighbor sensors;

    • \(pen_i^{off}() \) – a penalty of inactive agent \(A_i\) for not covering its POIs when their number exceeds threshold value \(m_i(1-q)\);

    • \(rev_i^{on}()\) – a reward of active agent \(A_i\) for covering its POIs and redundant covering by active neighbor sensors.

From the point of view of rational players each second player in a ring consisting of N player (N – even number) should select action “1” while remaining players should select action “0”. It means that the following relations between rewards of a player i in the game should be fulfilled in order to achieve NE:

  • \(b = rev_i^{on}(0,1,0) > a = rev_i^{off}(1,0,1)\),

  • \(a = rev_i^{off}(1,0,1)> d_1 = rev_i^{off}(1,0,0) = rev_i^{off}(0,0,1) > c\),

  • \(a = rev_i^{off}(1,0,1)> d_2 = rev_i^{on}(1,1,0) = rev_i^{on}(0,1,1) > c\),

  • \(a = rev_i^{off}(1,0,1)> d_3 = rev_i^{on}(1,1,1) > c\),

  • \(d_2 = rev_i^{on}(0,1,1)> d_1 = rev_i^{off}(0,0,1) > c\).

The payoff function \(u_i^2(\alpha _{i\ominus 1}, \alpha _{i},\alpha _{i\oplus 1})\) (see, Table 2) fulfills these requirements.

Table 2. Payoff function \(u_i^2(\alpha _{i\ominus 1}, \alpha _{i}, \alpha _{i\oplus 1})\) for \(i-th\) player.

5 Iterated Games of Learning Automata: Experimental Study

In this section we will study dynamic games of deterministic \(\epsilon \)-LA [12, 15] acting as players in iterated games presented in Sect. 4. \(\epsilon \)-LA has d actions and acts in a deterministic environment \(c = (c_1, c_2, ...,c_d)\), where \(c_k\) stands for a reward obtained for its action \(\alpha _{k}\). It has also a memory of the length H. Whenever an automaton generates an action, the environment sends it a payoff in a deterministic way. The objective of a reinforcement learning algorithm represented by \(\epsilon \)-automaton is to maximize its payoff in an environment where it operates.

Fig. 4.
figure 4

Model 1: the average team payoff vs \(\epsilon \) and H for \(N=2\) (left) and \(N=32\)) (right).

Fig. 5.
figure 5

Model 1: typical run (\(N=32\)): the average team payoff (left) and a number of active players in the game (right).

Fig. 6.
figure 6

Model 2: typical run (\(N=32\)): the average team payoff (left) and a number of active players in the game (right).

The automaton remembers its last H actions and corresponding payoffs. As the next action \(\epsilon \)-automaton chooses its the best action from the last H games (rounds) with the probability 1 - \(\epsilon \) (\(0 < \epsilon \le 1\)), and with probability \(\epsilon /d\) any of its d actions. In our case \(d=2\) (sleep or active). The purpose of this study was to find out experimentally whether and under which conditions the team of players is able to find in a fully distributed way solutions of the games represented by NEs.

We start the overview of conducted experiments with results for Model 1. We studied the behavior of teams consisting of \(N=2, 4, 16\) and 32 players for different values of \(\epsilon \) and H. Some results of this study are shown in Fig. 4. One can see (Fig. 4 (left)) that for \(N=2\) the ability to reach NE under given value of H depends on the value of \(\epsilon \). Lower value of \(\epsilon \) results in higher average value of team payoff, which for NE is equal to 1.25. Increasing value of \(\epsilon \) also increases the chance of disrupting NE. The figure shows also that this ability depends on the value of H. An optimal value of H is around 4–8. Too small values of H (\(H=2\)) makes the team very unstable, while higher values of H reduce the ability to achieve NE.

For increasing values of N, the dependence on \(\epsilon \) and H is similar (see, Fig. 4 (right)) like for small values of N, but the team of players is more stable for a whole range of values of H. Figure 5 (left) shows a typical run (\(N=32\)) of the game as a function of a number of rounds. One can see that the team of LA achieves NE after around 37 iterations. An optimal number of active player-sensors is equal to 1 (see, Fig. 5 (right)). One can see that while playing the game corresponding to NE there is a small probability that a player-leader can change suddenly and temporary its action.

Results of the experimental study of Model 2 are similar to Model 1. The ability to achieve NE by a team of LA depends in a similar way (not shown here) like for Model 1 on values of \(\epsilon \) and H, except that only for small values of H and relatively large values of \(\epsilon \) the system can lose its stability for any number of LA players. Figure 6 shows an example of the run of the game with a number of players \(N=32\) for values of \(\epsilon =0.001\) and \(H=8\). One can see (Fig. 6 (left)) that the team of LA is able to reach relatively fast the corresponding NE providing the highest average team payoff. Figure 6 (right) shows that an optimal number of active sensor-players in the NE is equal to 16. It may happen that a number of active players in the game may change suddenly and temporary.

6 Conclusion

We have proposed an approach to lifetime optimization in WSN which assumes replacing a problem of a global optimization by a problem of searching for NE by a team of players participating in a non-cooperative game. We analyzed relations between the coverage problem and the lifetime optimization problem, and selected two building blocks – basic sources of imbalance between the level of coverage of POIs and batteries expenditure and proposed game-theoretical models for their solutions.

We have shown that in iterated games a team of deterministic \(\epsilon \)-LA was able to find in a fully distributed way global solutions presenting in this way the possibility of self-organization in WSN oriented on solving the lifetime optimization problem. We believe that combining this approach with CA will stimulate the development of second order CA able to solve optimization problems.