1 Introduction

Wireless sensor networks (WSNs) have become a research hotspot recently [1]; for example, in gathering event information from animal habitats, environmental monitoring, etc. [2]. To meet the global challenge of dramatically reducing energy consumption and the exponentially growing data traffic, creating green wireless sensor networks (GWSNs) is becoming an important issue [3, 4]. GWSNs normally consist of a large number of distributed sensors, each sensor is expected to work on batteries for several months to a few years without replenishing. Thus, energy efficiency becomes a critical issue in GWSNs [58].

In GWSNs, tracking multiple events is one of the most important researches in the last two decades [6]. Since each sensor in GWSNs has a limited range, it can detect the events, and communicate with other sensors within its communication range. To achieve the reduction of energy consumption, green networking techniques must be implemented to change the current network according to demand. Sleep scheduling is a standard approach for balancing energy consumption. By this approach, the sensors may be put into a sleep mode. It was estimated that by setting some sensors into sleep mode, more than 23 % of the total energy can be saved in a network [7]. There are two primary approaches to study the sleep scheduling problem: one approach employs duty cycle for conserving energy and extending the network lifetime, sensors stay awake only a fraction of the time, sensing and communication are done only during this awake time [5, 6, 816]. The other addresses lifetime maximization as an optimization of the energy consumption for a given network; the most commonly used optimization technique is Markov decision process (MDP) [3, 1722].

This paper attempts to examine the fundamental theory of sleeping in GWSNs for tracking, as opposed to the design of algorithm for this sleeping. An optimization problem was formulated as the form of partially observable Markov decision process (POMDP) [23], based on which, an optimal sleep scheduling algorithm with online planning (Dec-POP-SSA) is proposed to solve the sleep scheduling problem.

The main contributions of this work are listed as follows:

  1. 1.

    We propose a novel scheme for the sleep scheduling based on decentralized partially observable Markov decision process (Dec-POMDP). In Dec-POMDP, due to the hardness of obtaining the state spaces and the reward with mold-free environment, quasi-Monte Carlo (QMC) is applied to collect state spaces such that the real-time acquisition of beliefs state is achieved, and the reward is evaluated by tracking reward and coverage connectivity intensity.

  2. 2.

    A sleep scheduling algorithm with online planning (Dec-POP-SSA) and Dec-POMDP is presented. Instead of producing the entire plan, Dec-POP-SSA only finds actions for the current step. Those actions are kept in the belief pools. Cluster head maintains a sub-belief pool, and every sensor in the same cluster collaborates to reduce network traffic through selective communication.

The remainder of this paper is organized as follows: Sect. 2 introduces the related works. Section 3 presents the problem statement of sleep scheduling. Section 4 is devoted to the Dec-POP-SSA algorithm. The theoretical and numerical analyses are shown in Sects. 5 and 6, respectively. Finally, Sect. 7 concludes this paper.

2 Related works

There are two primary approaches to explore the sleep scheduling problem in earlier literature: The first one is to design a sleep scheduling method with the duty cycle scheme. The second one is to utilize the technique of optimization. For the first one, Zhao et al. [9] present the virtual backbone scheduling (VBS) for WSNs. To prolong the network lifetime, it forms multiple, overlapped backbones to work alternatively. Nan et al. [10] present a coverage-guaranteed sleep/wake scheduling scheme (CDSWS). Liu et al. [11] propose a randomized scheduling scheme with guaranteed connectivity (RSGC) for WSNs, which takes into consideration the maintenance of both coverage ratio and connectivity for each group. Wang et al. [12] give a structure named two-hops-cluster scheme, which also considers the maintenance of covered and connected in the network. Ding et al. [13] establish an adaptive partitioning scheme called connectivity-based partition approach (CPA), which divides the network into several groups, while only one node in each group will be selected to be active to form a backbone network. CPA ensures the effective connectivity of the network. Yu et al. [14] propose a distributed nucleus algorithm (DNA) for domatic partition, which considers the coverage guaranteed. Kim et al. [6] address the directional cover and transmission problem of organizing the directional sensors into a group of non-disjoint subsets to extend the network lifetime. Jabbar et al. [5] propose one cluster-designing algorithm for lifetime improvement of wireless sensor networks. Censor-Hillel et al. [15] give a scheme based on network connectivity. Zhang and Li [16] focus on employing different scheduling strategies on border nodes and internal nodes, which also guarantee the coverage. Furthermore, the duty cycle scheme can use the powerful tools, such as queueing theory, game theory and the parallelization methodologies [2433].

It is easy to see that the duty cycle approach aims to ensure coverage and connectivity. In every time slot, some sensors need to be woken up, and each sensor is awake for a predetermined time slot. Furthermore, it does not use information about the location of the events that happened in the network. Compared with the duty cycle approach, the second approach addresses sleep scheduling as technique of POMDP, which depends on the location information of the events. These approaches will result in a tradeoff curve between tracking reward and energy consumption, which can save more energy than that by the duty cycle approaches.

Fuemmeler and Veeravalli [17] identify two suboptimal sleeping policies, the first cost reduction (FCR) solution and the Q-cost-based MDP (QMDP) solution, which will be applied in optimizing the tradeoff between energy cost and tracking error without assuming the use of waking up radios. They assume that the sleeping sensors can not be woken up externally, then, internal timers need to be set when the sensors can wake up. Therefore, the control actions correspond to the sleep durations of awake sensors. Atia et al. [20, 21] study a single object tracking problem where the sensors can be turned on or off at consecutive time slots to conserve energy based on the framework of a POMDP. Mihaylov et al. [22] present a decentralized reinforcement learning (RL) approach for self-organizing wake-up scheduling in wireless sensor networks. Fuemmeler et al. [19] establish the approach which builds on the policies designed in [17]; they use Monte Carlo simulation and learning algorithms to compute those parameters, for the special case of a discrete state space with continuous Gaussian observations. They also prove that the POMDP approach can result in better asymptotic behavior as the size of the network becomes large. Fuemmeler and Veeravalli extend the results to multi-object tracking in [18].

A bright full view of the improvement is shown in Table 1, it is clear that the coverage can be guaranteed by the duty cycle approaches. However, sleep scheduling scheme through technique of POMDP is more effective for event tracking.

Table 1 Sleep scheduling algorithms in WSNs

3 Problem formulation

3.1 Background of Dec-POMDP

Markov decision process (MDP) provides a unified model representation for planning under uncertainty environment [23]. The key decision factor in MDP is the state of the system, which contains all the information the agent decision needs for the current strategy. That is to say, according to the current state, agent can make optimal decisions, without having to consider other information. This is the Markov property of MDP. In the sleep schedule problems, the sensor does not have direct access to the system status information; the local feedback information of state comes from local information, which is called observation. Partially observable Markov decision process (POMDP) extends the MDP model to handle state’s uncertainty by incorporating observations and a probabilistic model of their occurrence, which is more suitable for sleep schedule problems in GWSNs. The probabilistic model of the occurrence is denoted as belief state.

Different formal models for POMDP policy have been proposed over the last decade. The main bottleneck is the famous curse of dimension. Sleep schedule problems in GWSNs also have such problems. First, in addition to the sensors’ states, the observed sequence of steps with the decision always shows exponential growth. Second, each individual sensor may have different partial information about other sensors and the state of the whole network. Since only by this way, it can coordinate with each other between multi-nodes, then cooperate to complete common sleep schedule tasks. However, it is sometimes difficult to evaluate the state of the sensor quantitatively, especially under uncertainty environment. Dec-POMDP is a mathematical framework, which is useful for modeling resource control problems where the decision making is decentralized [34]. Dec-POMDP offers a general approach to deal with resource management in a multi-node scenario with decentralized control, whose goal is to solve the problem of common multi-node teamwork.

In cooperative decision-making settings such as sleep schedule problems, the sensors just have to wake up with a plan that maximizes the expected reward of the network. The planning in Dec-POMDP takes place either offline or online. In offline planning, sleep scheduling plans are distributed to each sensor and executed by each sensor based on its information, and the whole plan needs to be generated at once. Hence the planning phase can be centralized as long as the execution is decentralized. However, online planning algorithms often interleave planning with execution, the current step is only needed for finding sensors’ states. Sensors can share information with each other by communication.

Since a very limited time is needed to consider distributed coordination among the sensors in online planning, thus the online planning algorithms usually combine with some offline algorithms. In the offline programming, the form of policy is often represented as one full tree, the target of each iteration step in the sleep scheduling is to choose optimal subtree for each branch. While in the online planning, historical information of the sensors’ policy is expressed as series form, the goal of such planning is to select an optimal operation for each of the historical information. In this paper, the Dec-POP-SSA algorithm aims to solve the coordination between sensors and the efficient utilization of limited resources by online planning. The key of the algorithm is to let each sensor calculate the joint policy independently, and to improve decision making reward maximum.

3.2 Formal model of Dec-POMDP

Some notations that will be used throughout this paper are given first, the important symbols with their definitions are collected in Table 2.

Table 2 Notation

Definition 1

(Dec-POMDP) [23] A decentralized partially observable Markov decision process is a tuple

$$\begin{aligned} \langle I,S,\{A_{i}\},\{\Omega _{i}\},P,O,R,b^{0} \rangle , \end{aligned}$$

where, \(\overrightarrow{A }=\times _{i \in I} A_{i}\) is the set of joint actions, \(a_{i}\in A_{i} \), and \(\overrightarrow{a} = \langle a_{1},\ldots ,a_{n}\rangle \) denotes a joint action. \(\overrightarrow{\Omega }=\times _{i \in I}\Omega _{i}\) is the set of joint observations, \(O_{i}\in \Omega _{i} \), and \(\overrightarrow{o} = \langle o_{1},\ldots ,o_{n}\rangle \) denotes a joint observation. \(P(s'|s, \overrightarrow{a})\) denotes the probability that taking joint action \(\overrightarrow{a}\) in state s results in a transition to state \(s'\). \(P\) describes the stochastic influence of actions on the environment. \(O(s'|s, \overrightarrow{a})\) denotes the probability of observing joint observation \(\overrightarrow{o}\) after taking joint action \(\overrightarrow{a}\) and reaching state \(s'\). \(O\) explains how the sensors perceive the state of the environment.

For sleep scheduling problem, each sensor can be in one or two states: awake or asleep. One sensor in the awake state consumes more energy than that in sleep state. However, event sensing can be performed only when the sensors are in the awake state. An awake sensor only detects whether one or more events are within its range, and can not detect the exact number of events present or which events are present. It is presumed that during the following actions are taken, sensors wake up and remain the awake state for one time unit.

  1. 1.

    The sensor sends a binary observation to the central unit that indicates whether one or more events are within its range;

  2. 2.

    The sensor receives a new sleep time from the central controller.

Since it is impractical for the wake-up signals, a timer is initialized at the sensor that is decremented by one time unit each time step, the sensor wakes up until the timer expires, this is the only mechanism for waking up one sensor. A state summarizes all the relevant features of the dynamical system and satisfies the Markov property. That is, the probability of the next state depends only on the current state and the joint action, not on the previous states and joint actions:

$$\begin{aligned} P(s^{t+1}|s^{0}, \overrightarrow{a}^{0},\ldots , s^{t-1}, \overrightarrow{a}^{t-1}, s^{t} , \overrightarrow{a}^{t} ) = P(s^{t+1}|s^{t} , \overrightarrow{a}^{t} ). \end{aligned}$$
(1)

Let \(u^{k}_{l}\) denote the sleep time input supplied to sensor \(l\) at time \(k\), which can be rewritten as

$$\begin{aligned} u^{k}_{l}=\mu _{k}(p^{k}_{l},r^{k}_{l}). \end{aligned}$$
(2)

The value of the sleep timer of sensor \(l\) at time \(k\) is denoted as \(r^{k}_{l}\), the evolution of the residual sleep time can be described as

$$\begin{aligned} r^{k+1}_{l}=(r^{k}_{l}-1)1\!\!1_{\{r^{k}_{l}>0\}}+u^{k}_{l}1\!\!1_{\{r^{k}_{l}=0\}}. \end{aligned}$$
(3)

There are \(n\) sensors in the network, for each time slot \(k\) and each sensor \(l\in \{1,\ldots ,n\}\). The Eq. (3) is composed of two parts. The first one implies that if the sensor is currently asleep, which is denoted as \(1\!\!1_{\{r^{k}_{l}>0\}}\), then the sleep timer is decremented by 1. The second one implies that if the sensor is currently awake, which is denoted as \(1\!\!1_{\{r^{k}_{l}=0\}}\), then the sleep timer is reset to the current sleep time input for that sensor.

Denote \(L_{k}\) the events occur at the range of some sensors at time slot \(k\). Unfortunately, \(L_{k}\) is recognized only when the events locations are being tracked precisely. Thus, there is a dynamical system with partially observed state information. If the observation available to the central unit at time \(k\) is denoted by

$$\begin{aligned} \lambda _{k}=\left[ {\begin{array}{*{20}{c}} {{p^{k}_{1}}} , {{r^{k}_{1}}} \\ \ldots \\ {{p^{k}_{n}}} , {{r^{k}_{n}}} \\ \end{array}} \right] \!, \end{aligned}$$
(4)

where \(p^{k}_{l}\) is a vector of observations with

$$\begin{aligned} p^{k}_{l}=p_{l}(b^{k}=b). \end{aligned}$$
(5)

To provide a method for centralized control, the presence of an extra is assumed as a central controller. The central controller keeps track of the state of the network and assigns sleep times to sensors that are awake. After initial distribution of sleep time for every sensor, the centralized control wants to maintain the belief pool. Each sensor computing policy is distributed. The state of sensor information is referred to as sensors status history, it is the sequence \(S\) that includes state behavior \(\lambda _{k}\), reward \(R\), observation \(O\), feedback response and so on. In detail, \(S=\{s_{i}|i=1,2\ldots ,n\}\), where \(s_{i}=\{(\lambda ,R,O,\ldots )_{1},\ldots (\lambda ,R,O,\ldots )_{t}\}\). Within a certain time slot, in the whole network, the possibility of environmental states are

$$\begin{aligned} S_{i}(t)=s_{1}(t)s_{2}(t)\cdots s_{n}(t),s_{j}(t)\in \{0,1\}, \end{aligned}$$

where \(i=1,2,\ldots ,M; j=1,2,\ldots ,n, M=2^{n}\). All possible sensor status is \(S(t)=[S_{1}(t),\ldots ,S_{M}(t)]\), which is shown in Fig. 1.

Fig. 1
figure 1

The status of every nodes in the network

Theorem 1

There are \(n\) sensors in the network, \(\lambda =[\lambda _{1},\ldots ,\lambda _{\mathcal {T}}]\) is the Sufficient Statistic of the states \(S_{i}(t)\), where \(\lambda _{t}\) is defined as formula (4).

Proof

At time \(t\), the conditional probability of the state \(S_{i}(t)\) at time \(t-1\) is

$$\begin{aligned} Pr[S(t)&= S_{i}(t)| \lambda ] =Pr[s_{1}(t)=j_{1},\ldots ,s_{n}(t)=j_{n}| s_{1}(t-1)\nonumber \\&= \lambda _{1},\ldots ,s_{n}(t-1)=\lambda _{n}],i\in [1,M]. \end{aligned}$$
(6)

Since the nodes are independent to each other, we have

$$\begin{aligned}&Pr[S(t)=S_{i}(t)| \lambda ]\nonumber \\&\quad =Pr[s_{1}(t)\!=\!j_{1}|s_{2}(t)\!=\!j_{2},\ldots ,s_{n}(t)\!=\!j_{n}, s_{1}(t-1)=\lambda _{1},\ldots ,s_{n}(t-1)=\lambda _{n}]\nonumber \\&\quad \times Pr[s_{2}(t)=j_{2},\ldots ,s_{n}(t)=j_{n}| s_{1}(t-1)=\lambda _{1},\ldots ,s_{n}(t-1)=\lambda _{n}]\nonumber \\&\quad = \prod _{k=1}^{n}Pr[s_{k}(t)=j_{k}|s_{k}(t-1)=\lambda _{k}]. \end{aligned}$$
(7)

Thus, the state \(S_{i}(t)\) can be obtained by \(\lambda \), that is to say, \(\lambda =[\lambda _{1},\ldots ,\lambda _{\mathcal {T}}]\) is the Sufficient Statistic of the states \(S_{i}(t)\). \(\square \)

Under initialization of the belief stat \(b^{0}\), the evolution of \(p^{k}=[p^{k}_{1},\ldots ,p^{k}_{n}]\) is difficult to be expressed mathematically, but it is a standard nonlinear filtering operation. The quasi-Monte Carlo sampling will be used to obtain \(p^{k}\), which will be described in Sect. 4 in detail.

3.3 Reward of sleep schedule problem

The reward present in the sleep scheduling problem will be identified in this section, which is decomposed into three components: the first reward is the profit for the coverage and connectivity, the second is the tracking reward, and the third is an energy cost of \(c>0\) for each sensor that is awake. This model is used for simplicity, although it would perhaps take different cost at each sensor, the energy cost can be written as \(\sum _{l=1}^{n}c1\!\!1_{\{r^{k}_{l}=0\}}\).

Definition 2

(coverage intensity) Suppose that there are \(n\) sensors \(\{v_{1},v_{2},\ldots ,v_{n}\}\) and \(m\) events \(\{e_{1},e_{2},\ldots ,e_{m}\}\) in the network. The network coverage intensity \(Cov\) is the ratio of the time when an event happens in the field of the sensor network is covered by at least one active sensor node to the total time.

If these \(k\) events are covered by the active sensor \(v_{i}\), which is defined as \(cv_{i}=k\), for any active sensors \(v_{i}\in V\), the coverage intensity can be calculated as Cov\((v_{i})=\frac{cv_{i}}{m} \) if the area is covered by multiple active sensors \(v_{i}\), \(i=1,2,\ldots ,n\). Then the coverage intensity can be calculated as:

$$\begin{aligned} \mathrm{Cov}=E\left( \mathrm{Cov}\left( \bigcup _{v_{i} \in 1\!\!1_{\{r_{k}=0\}}}v_{i}\right) \right) =E\left( \sum _{v_{i}\in 1\!\!1_{\{r_{k}=0\}}}\frac{cv_{i}}{m} \right) \!. \end{aligned}$$
(8)

Definition 3

(Connectivity intensity) Let \(A\) and \(B\) be two different clusters. For any sensor \(u\) in \(A\) and any sensor \(v\) in \(B\), clusters \(A\) and \(B\) are called neighboring clusters if \(d(u,v)\le \sqrt{3}|\mathrm{radius}|\). The connection value between \(u\) and \(v\) is referred to be 1, say Con\(_{uv}=1\) if both of \(u\) and \(v\) are the active sensors and at least one active sensor between them; otherwise, Con\(_{uv}=0\). Thereby, the connectivity intensity can be calculated as:

$$\begin{aligned} \mathrm{Con}=\sum _{u_{i}\in 1\!\!1_{\{r_{k}=0\}}} \sum _{v_{j}\in 1\!\!1_{\{r_{k}=0\}}} \frac{2\mathrm{Con}_{u_{i}v_{j}}}{|C_{(A)}|+|C_{(B)}|}, \end{aligned}$$
(9)

where \( i\in \{1,\ldots ,|C_{(A)}|\}, j\in \{1,\ldots ,|C_{(B)}|\}\).

The total reward for time slot \(k\) can be expressed as

$$\begin{aligned} R\left( p^{k},r_{k}\right)&=\sum p^{k}\bigg (\sum _{ l=1}^{n} 1\!\!1_{\{r^{k}_{l}=0\}}\gamma _{1}(\mathrm{Cov+Con}) \nonumber \\&\quad \quad +\gamma _{2}\bigg (1\!\!1_{\{r_{k}=0\}}e_{k}-1\!\!1_{\{r_{k}>0\}}o_{k}-\sum _{l=1}^{n}c1\!\!1_{\{r^{k}_{l}=0\}}\bigg )\bigg ). \end{aligned}$$
(10)

Tracking Reward is defined as: \(\textit{TR}=1\!\!1_{\{r_{k}=0\}}e_{k}-1\!\!1_{\{r_{k}>0\}}o_{k}\), which can be further decomposed into two components. The first component is tracking right when the sensors are woken up and observe some events \((e_{k})\) that occur. The second component is the observation error that occurs when there is a failure to observe particular events \((o_{k})\), both of \(\gamma _{ 1}\) and \(\gamma _{2}\) are the discount factors.

The infinite horizon reward for the system is given by

$$\begin{aligned} R\left( p^{0},r_{0},\mu _{0},\mu _{1},\ldots ,\mu _{T}\right) =E\left[ \sum _{k=1}^{I} R\left( p^{k},r_{k}\right) \right] \!, \end{aligned}$$
(11)

The goal is to compute the solution to

$$\begin{aligned} R^{*}\left( p^{0},r_{0}\right) =\max _{\mu _{0},\mu _{1},\ldots ,\mu _{T}}R\left( p^{0},r_{0},\mu _{0},\mu _{1},\ldots ,\mu _{\mathcal {T}}\right) \!. \end{aligned}$$
(12)

4 Suboptimal solutions

However, since an optimal solution could not be found for the sleep scheduling problem, this paper turns attention to find suboptimal solutions to the sleep scheduling problem.

Each sensor selects actions individually by Dec-POMDP policy, but it is the resulting joint action that produces the outcome. Coordination is thus a key aspect in such systems. The goal of coordination is to make sure that the individual decisions of the sensors results in optimal decisions for the cluster as a whole. The joint policy is calculated on real-time online. As long as all the sensors in the same cluster maintain the constant sub-belief pool, the outcome plans they compute will be the same. This guarantees that the strategies of the sleep scheduling are coordinated.

Definition 4

(Belief pool) [23] A belief pool at time slot \(t\) is defined by a tuple \(<\{h_{(x)}^{t} | i\in I\}, p_{(x)}^{t}>\), where \(h_{i}^{t}\) is a set of histories for sensor \(i\) and \(p^{t}\) is a set of joint belief states.

The whole belief pool is partitioned by clusters to store information, each sub-belief pool stores the states of sensors, which are in the same cluster.

This section introduces a new algorithm, Dec-POMDP-based sleep scheduling algorithm with online planning (Dec-POP-SSA), to find suboptimal solutions of the sleep scheduling problem. This algorithm is executed in parallel by all the sensors in the same cluster, and is divided into three sub-phases: a planning phase, an executing phase and an updating phase. In the planning phase, each sensor computes a joint policy for all possible histories in the belief pool, sensors in each cluster will be initialized with an empty history information and their sub-belief pool as \(b^{0}\). During the executing phase, each sensor adds its new local observation to its own local history, and executes an action according to its component in the joint policy and its current local history. After that, during the updating phase, each sensor updates its own sub-belief pool based on the plan. An example involving two sensors that are in the same cluster is illustrated in Fig. 2. Two main problems need to be solved in the design:

  1. 1.

    how to maintenance the belief states;

  2. 2.

    how to solve the joint policy.

4.1 Phase 1: generate the belief pool

For the first problem, an efficient data structure must be designed to represent a belief pool and the sensors’ historical information. The sub-belief pool stores the sub-belief state with every clusters. When storing the history, it does not need to save the whole action observation sequence in the belief pool, and it does not need the joint history information of all the sensors, it only needs to maintain the sub-belief pool, which includes sub-joint history information of sensors in each cluster \((x)\). A hash table \(h_{(x)}^{t}=\langle \theta _{(x)}b_{(x)}^{t} \rangle \) is used to represent the joint history, where \(\overrightarrow{\theta _{(x)}}= \langle \theta _{(1)},\theta _{(2)},\ldots ,\theta _{(|C_{(x)}|)} \rangle \) is the joint index to a history, \(|C_{(x)}|\) represents the number of nodes in the cluster \((x)\), \(\theta _{(i)}\) is the history index for sensor \(i\), \(b_{(x)}^{t}\) is the joint belief in the cluster, which is the joint history information induced within the cluster. That is, the state probability distribution of the sensors in the same cluster. In each sub-belief pool, since only one history is kept for one policy, the history index for sensor \(i\) is represented as a tuple \(\theta _{(i)}=\langle a_{i}^{t-1},o_{i}^{t}\rangle \) where \(a_{i}^{t-1}\) is the policy tree index of the previous step and \(o_{i}^{t}\) is the observation of the current step. Therefore, the data structure is

$$\begin{aligned} h_{(x)}^{t}=\langle \langle a_{1}^{t-1},o_{1}^{t}\rangle ,\langle a_{2}^{t-1},o_{2}^{t} \rangle ,\ldots , \langle a_{n}^{t-1},o_{n}^{t}\rangle ,b_{(x)}^{t} \rangle \!. \end{aligned}$$
(13)

To compute a set of joint belief states of the current step, usually use Bayes’ rule from the previous step. In this paper, since the state transition and observation function is unknown, it cannot compute a set of joint belief states through the Bayes’ rule.

Fig. 2
figure 2

The framework of Dec-POP-SSA

Beliefs generating is a typical nonlinear filtering problem, due to the influence of the weak system observability, initial value error and its covariance are very large, the standard particle filter algorithm is prone to degradation and rapid particle impoverishment phenomenon, filtering performance is poor. Monte Carlo random sampling cannot effectively simulate data completely, it will influence the estimation precision, since the sample swatches will be gathered and gap phenomenon, can’t fully get the sample space.

One major advancement has come in the development of alternatives to the pseudo-random or pseudo-Monte Carlo (PMC) number sequences that the simulation processes are based on [34]. The use of so-called quasi-random number sequences or quasi-Monte Carlo sequences (QMC) can lead to significant savings in simulation and estimation processes, by enabling stable performance with a lower number of draws. In transportation research, the most commonly used type of quasi-random number sequences is the Halton sequence [35, 36].

Definition 5

(Halton sequence) [36] For any prime number \(q\) and any positive integer \(D\), there is a unique base \(q\) representation

$$\begin{aligned} D=\sum _{j\ge 0}Z_{j}q^{j}. \end{aligned}$$
(14)

The \(Z_{j}=Z_{j}(q,D)\in \{0,\ldots ,q-1\}\) are the ‘bits’ of \(D\) in base \(q\). For any \(D < q^{m}\), this sequence has all zero entries in positions \(j \ge m\). With the bit sequence \((Z_{j}) = (Z_{j}(D))\) in hand, it is defined as

$$\begin{aligned} \beta _{q}(D) := \sum _{j\ge 0}Z_{j}q^{-j-1}. \end{aligned}$$
(15)

Given the space dimension \(d\ge 1\), the first \(d\) prime numbers \(y_{1},\ldots ,y_{d}\) are chosen. The sequence of points \((\widehat{x}_{k})_{k\in N}\) in \([0,1]^{d} \) is then defined by

$$\begin{aligned} \widehat{x}_{k}:=(\beta _{y_{1}}(k),\ldots ,\beta _{y_{d}}(k)). \end{aligned}$$
(16)

The quasi-Monte Carlo method will be used to compute the joint belief states. For each decision cycle \(t\), select \(\mathcal {K}\) deterministic sample points \(\left\{ x_{t}^{(i)}\right\} _{i=1}^{\mathcal {K}}\) by QMC method, which can get the estimation precision better than that of Monte Carlo Method. Then the simulator can be utilized to calculate state distribution in the absence of a complete model. But the amount of calculation growth index form. Carrier Aggregation is used for all clusters, to reduce the sampling size at the same time to ensure that samples move to the high likelihood, to improve the sampling accuracy.

At the end of each decision cycle, the information index is updated for each of the historical. Then, the cluster head in each cluster, sample joint history by QMC method, and calculates the corresponding joint beliefs in the cluster. Historical information is a sequence, which contains a sensor’s action on each step and the observation from the cluster environment. The example of one sub-belief pool is shown in Fig. 3, in the belief pool, for the node whose action is \(a_{1}\), when it obtains the observation \(o_{1}\),the probability of transition action to \(a_{2}\) is 0.9, and the probability of transition action to \(a_{3}\) is 0.1. When this node obtains the observation \(o_{2}\), the probability of transition action to \(a_{2}\) is 0.4, and the probability of transition action to \(a_{3}\) is 0.6. The belief states are obtained by the QMC method. Each node can decide its next action, depending upon this belief pool and the node’s current action. For example, for the node \(A\), assume its current action is \(a_{1}\), if it obtains the local observation \(o_{1}\), according to the belief pool, it will transition action to \(a_{2}\). For the node \(B\), assume its current action is \(a_{2}\), if it obtains the local observation \(o_{2}\), according to the belief pool, its action is not changed. The algorithm of generating the state distribution by QMC method is shown in Algorithm 1.

Fig. 3
figure 3

The example of the one sub-belief pool

figure a

4.2 Phase 2: history merging

During the executing phase, note that the observations of other sensors in the same cluster are not available. Each sensor must consider about all the possible histories that could be observed by the other sensors in the same cluster, and it need to know how that may affect its own action selection. However, the number of possible joint histories increases exponentially with the horizon. A more detailed analysis of the planning phase shows that most of the joint histories kept in memory are useless. While each sensor does not share private information with each other in the whole network, the sensors in a cluster can maintain the histories based on the information they have.

By the online planning, the policy equivalent condition facilitates the sub-policy reuse, which is to map several histories to a single sub-policy. It is also called merging when handling several histories. For example, some active sensors do not observe events after a period of time, they go to the sleep mode, there is no meaning of the state history before. If there are more than two similar clusters environments, the same policy can be taken among them, then we choose any one history to be recorded in the belief pool. Furthermore, in the same cluster, it requires to know \(h_{-i}\), every possible historical combination of other sensors. Since every single value may affect the policy of sensor \(i\), considering every possible \(h_{-i}\) is important. Algorithm 2 describe the history merging algorithm in detail.

figure b

For the second problem: how to solve the joint policy, without knowledge of the observations of the other sensors, every sensors must consider about all the possible belief states that could be held by others. In order to find sensor \(i\)’s sleep scheduling policy \(\mu _{i}\) for history \(h_{i}\), \(i\) need to consider about every possible histories \(h_{-i}\) in the same cluster, the possible policies associated with them are also to considered. That is to say, one joint policy \(\overrightarrow{\mu }\) need to be found, which can maximise the reward function as follows:

$$\begin{aligned} R(\overrightarrow{\mu })=\sum _{h\in H}\sum _{s\in S}pR(\mu (h),s). \end{aligned}$$
(17)

The goal of the Dec-POP-SSA is that each sensor independently calculates the joint policy \(\mu (h)\) for the cluster and then executes its share of the policy based on its own local history.

As Dec-POP-SSA algorithm consists of three phases, this section describes these three phases in detail. In the planning phase, each node computes its own policy by offline, the policy represented as a layered structure, a layer of policy represents a decision cycle. A fixed number of decision nodes are at each layer, and each node contains the selected actions and belief distribution calculated by QMC sampling. In the executing phases, according to the action distribution, each node chooses an action, then according to the observation from the environment, each node moves to the next layer. In the online policy, the execution of the algorithm is simple and clear. So the key is the way to generate such a policy, which is created in the planning stage. In every iteration step, a policy is chosen for each node, and then calculate state distribution through QMC sampling. In the updating phase, according to the distribution, optimize and ameliorate the policy, the process continues, until there are no policies for further improvement.

As shown in Fig. 2, it can complete information sharing between the sensors before entering the planning phase. But in the GWSNs, communication is costly if the active sensors are far from each other, the frequency of communication should be limited for each sensor. In the follow possible way, sensors can share information and coordinate their actions, the belief pool was detected not agree with the information from the environment. That is to say belief inconsistency is detected when the sensor’s local observation does not agree with the projected joint belief.

Definition 6

(the observation inconsistency) For the sensor \(i\), at time slot \(t\), the sub-belief pool \(B^{t}\) that is obtained by QMC is said to be \(\varepsilon _{2}\)-inconsistency with \(i'\)s observation \(o_{i}^{t}\) if

$$\begin{aligned} |b(s)-p_{(x)}^{t}(b)|< \varepsilon _{2}, \end{aligned}$$
(18)

where \(p_{(x)}^{t}(b)\) is the sub-belief in cluster \((x)\) at time slot \(t\) by using QMC sampling, sensor \(i\) is in the cluster \((x)\).

This definition is used to monitor inconsistency between sensor \(i\)’s local history and observation, which provides an indication of historical inconsistency in the pool. If the inconsistent is occurred, the sub-belief pool can be refreshed by communicating with the other sensors, then the sub-belief pool contains the real joint history, and it is consistent with the observation, which leads to gain better joint policy, at the same time also reflect the value of the communication for decision-making.

There is an issue that how to choose the heuristic policy, the better calculation result can be gained by combining different heuristic policy. In the model-free environment, the most simple heuristic policy is random policy, that is each sensor chooses an action randomly. Another one, it can use manual coding policy according to common sense. The performance of the online algorithm depends on the heuristic algorithm, the stand or fall of the heuristic algorithm mainly related to the actual problem. In this paper, based on the structure of the cluster, let the cluster-heads action and let the cluster members go to sleep is a best heuristic algorithm. After the heuristic algorithm is given, the Dec-POP-SSA algorithm is to obtain the approximate optimal combination policy by constructing and resolving a series of linear programming.

To start the algorithm, each local policy is initialized to be deterministic for each sensor, by selecting the man-made manual coding policy. Then, each sensor \(i\)’s policy is improved, which is done for \(i\) by finding the best policy \(\mu _{i}(q_{i}|h_{i})\) as satisfying

$$\begin{aligned} R(\overrightarrow{\mu })+ \varepsilon _{1} \le \sum _{i\in C_{(x)}}p_{(x)}^{k}(b)\left[ \sum _{\overrightarrow{a}}\mu _{i} (a_{i}|h_{i}) \mu _{-i} (a_{-i}|h_{-i})R(\overrightarrow{a},b(h))\right] , \end{aligned}$$
(19)

where \(\mu _{-i} (a_{-i}|h_{-i})=\prod _{k\ne i}\mu _{k} (a_{k}|h_{k}) \).

Algorithm 3 describe the Dec-POP-SSA in detail:

5 Theoretical analysis

Deriving an upper bound of the expected reward is by a similar method of [18, 20]. This section derives an upper bound for the state space with QMC sampling. First, Lemma 1 provides an upper bound on the expected reward by one sensor; second, Lemma 2 provides an upper bound on the expected reward by one cluster; finally, Theorem 2 provides an upper bound on the expected reward for the whole network, and Theorem 3 provides the space and time complexity of Dec-POP-SSA.

figure c

Lemma 1

Given the current belief \(p^{k}\) which is obtained from the QMC sampling, and an action vector \(u_{k}\) which is computed through the Dec-POP-SSA algorithm, the current residual sleep times vector \(r_{k}\), the coverage intensity as well as connectivity intensity, the expected reward is upper bounded by

$$\begin{aligned} E[R\mid p^{k},u_{k}]&\le E(R_{0})-\gamma _{2}\sum _{i=1}^{m}p^{k}_{i} \sum _{j=1}^{m} p(d_{k+1}= j \mid d_{k}=i)\nonumber \\&\quad \times \max _{k\ne j} Q\left( \frac{d_{kj}}{2}+\frac{ln\frac{[p^{k+1}]_{j}}{[p^{k+1}]_{k}}}{d_{kj}}\right) . \end{aligned}$$
(20)

Proof

An upper bound on the reward is derived, by giving the current belief \(p^{k}\) which is obtained from the QMC sampling. Those active sensors must meet certain connectivity and coverage intensity. When the distribution of the state is obtained according to the QMC sampling, and the requirements of basic connectivity and coverage can be met, this is the upper bound of the reward. A large number of literatures are studied on network connectivity and coverage, they found that the number of active sensor related to the network area. When the network area is fixed. Then the number of active sensors is fixed, specific value is determined by the actual. This article did not describe detail, the basic reward is defined in advance:

$$\begin{aligned} E(R_{0})&= E\left[ \sum p^{k}(b)\left( \sum _{ l=1} ^{n} 1\!\!1_{\{r^{k}_{l}=0\}}\gamma _{1}\mathrm{(Cov+Con)}-\gamma _{2}\sum _{l=1}^{n}c1\!\!1_{\{r^{k}_{l}=0\}}\right) \right] . \end{aligned}$$
(21)

\(TR=1\!\!1_{\{r_{k}=0\}}e_{k}-1\!\!1_{\{r_{k}>0\}}o_{k}\) is the tracking error, the expected reward can be written as

$$\begin{aligned} E\left[ R\mid p^{k},u_{k}\right] =E(R_{0})-E[\gamma _{2}(TR)], \end{aligned}$$
(22)

where

$$\begin{aligned}&E[\gamma _{2}(TR)]= E\left[ \gamma _{2}\left( 1\!\!1_{\{r_{k}=0\}}e_{k}-1\!\!1_{\{r_{k}>0\}}o_{k}\right) \right] \nonumber \\&\quad =E\left[ \gamma _{2}\left( \sum _{j=1}^{m}Pr\left[ \widehat{d}_{k+1}\ne j \mid p^{k},u_{k},r_{k},d_{k+1}= j\right] \times Pr\left[ d_{k+1}= j \mid p^{k},u_{k}\right] \right) \right] \nonumber \\&\quad =\gamma _{2}\left( \sum _{i=1}^{m}p^{k}_{i} \sum _{j=1}^{m} p\left( d_{k+1}= j \mid d_{k}=i\right) \times Pr\left[ \widehat{d}_{k+1}\ne j \mid p^{k},u_{k},r_{k},d_{k+1}= j\right] \right) .\nonumber \\ \end{aligned}$$
(23)

If the states obtained by QMC are almost close to the real distribution, the reward is the upper bound. The real observation distribution is defined as:

$$\begin{aligned} P(E\mid H_{j})=Pr[\widehat{d}_{k+1}\ne j \mid p^{k},u_{k},r_{k},d_{k+1}= j]. \end{aligned}$$
(24)

Using standard analysis for likelihood ratio tests [20,38], we have

$$\begin{aligned} P(E\mid H_{j})\ge \max _{k\ne j} Q\left( \frac{d_{kj}}{2}+\frac{ln\frac{[p^{k+1}]_{j}}{[p^{k+1}]_{k}}}{d_{kj}}\right) \!, \end{aligned}$$
(25)

where \(d_{kj}^{2}=\frac{\Delta m_{kj}^{T} \Delta m_{kj}}{\sigma ^{2}}\), \(\Delta m_{kj}=m_{k}-m_{j}\), \(m_{j}\) is the mean received signal strength when the target is at state \(j\). \(Q(.)\) is the normal distribution, the quantity \(d_{kj}\) plays the role of distance between the two hypothesis and hence depends on the difference of their corresponding mean vectors and the noise variance \(\sigma ^{2}\).

The network involves a communication channel that is assumed perfectly reliable, it is assumed to be always available and without noise so that \(\sigma =1. \) The belief state at time slot \(k\) in cluster \((x)\) by QMC sampling is defined as \(p^{k}_{(x)}(b)\), the same to the initiation belief state \(b^{0}=p(b)\), it is also a \(|(x)|\times |(x)|\) matrix vector. The \(j\)th element is \([p_{(x)}^{k}(b)]_{j}\), which entry equals to the belief state in cluster \(j\). For simplicity, it is defined as \(\left[ p^{k}\right] _{j}=\left[ p_{(x)}^{k}(b)\right] _{j}\) in this section.

So, we have

$$\begin{aligned} E\left[ R\mid p^{k},u_{k}\right]&\le E(R_{0}) -\gamma _{2}\sum _{i=1}^{m}p^{k}_{i} \sum _{j=1}^{m} p(d_{k+1}= j \mid d_{k}=i)\nonumber \\&\quad \times \max _{k\ne j} Q \left( \frac{d_{kj}}{2}+\frac{ln\frac{[p^{k+1}]_{j}}{[p^{k+1}]_{k}}}{d_{kj}}\right) \!. \end{aligned}$$
(26)

\(\square \)

Lemma 2

The expected reward by one cluster \((x)\) is upper bounded by:

$$\begin{aligned} E\left[ R_{(x)}\mid p^{k},u_{k},r_{k}\right]&\le \sum _{l\in C_{(x)}} p^{k}(b) \left\{ 1\!\!1_{\{r^{k+1}_{l}=0\}} \sum _{i=1}^{m}p_{i}^{k}E_{0}(p;i,l)\right. \nonumber \\&\quad \left. +\, 1\!\!1_{\{r^{k+1}_{l}>0\}} \sum _{i=1}^{m}p^{k}_{i}E_{0_{-l}}(p;i,l)\right\} \!. \end{aligned}$$
(27)

Proof

The effect of each sensor on the reward is:

$$\begin{aligned} E_{0}(p;i,l)&= E(R_{0})-\gamma _{2}\sum _{j=1}^{m} p(d_{k+1}= j \mid d_{k}= i)\nonumber \\&\quad \times \max _{k\ne j} Q_{1\!\!1_{\{r^{k+1}_{l}=0\}}} \left( \frac{d_{kj}(1\!\!1)}{2}+\frac{ln\frac{[p^{k+1}]_{j}}{[p^{k+1}]_{k}}}{d_{kj}}\right) \!, \end{aligned}$$
(28)

which is the reward that all sensors will be active at the next time sole.

$$\begin{aligned} E_{0_{-l}}(p;i,l)&= E(R_{0})-\gamma _{2}\sum _{j=1}^{m} p(d_{k+1}= j \mid d_{k}= i)\nonumber \\&\quad \times \max _{k\ne j} Q_{1\!\!1_{-l}} \left( \frac{d_{kj}(1\!\!1_{-l})}{2}+\frac{ln\frac{[p^{k+1}]_{j}}{[p^{k+1}]_{k}}}{d_{kj}}\right) \!. \end{aligned}$$
(29)

Let \(1\!\!1_{-l}\) denote a vector of length \(n\) with all entries equal to one except for the \(l\)-th entry being zero.

The expected reward can be readily upper bound in cluster \((x)\):

$$\begin{aligned} E[R_{x}\mid p^{k},u_{k},r_{k}]&\le 1\!\!1_{\{r^{k+1}_{l}=0\}} E[R_{x}\mid p^{k},r_{k+1}=0] \nonumber \\&\quad +\, 1\!\!1_{\{r^{k+1}_{l}>0\}} E[R_{x}\mid p^{k},r^{k+1}_{i}=0,\quad \forall i\ne l]. \end{aligned}$$
(30)

Since this holds for every \(l\in C_{(x)}\), an upper bound on the expected reward in cluster \((x)\) can be written as a convex combination of all sensors contributions

$$\begin{aligned}&E[R_{x}\mid p^{k},u_{k},r_{k}]\nonumber \\&\quad \le \sum _{l\in C_{(x)}} p^{k}(b) \{1\!\!1_{\{r^{k+1}_{l}=0\}} E[R_{x}\mid p^{k},r_{k+1}=0] + 1\!\!1_{\{r^{k+1}_{l}>0\}} E[R_{x}\mid p^{k},r^{k+1}_{l}\nonumber \\&\quad =0,\forall i\ne l]\}\nonumber \\&\quad \le \sum _{l\in C_{(x)}} p_{k}(b) \left\{ 1\!\!1_{\{r^{k+1}_{l}=0\}} \left( E(R_{0}) -\gamma _{2}\sum _{i=1}^{m}p_{i}^{k} \sum _{j=1}^{m} p(d_{k+1}= j \mid d_{k}=i)\right. \right. \nonumber \\&\left. \left. \quad \quad \times \max _{k\ne j} Q_{1\!\!1_{\{r^{k+1}_{l}=0\}}}\left( \frac{d_{kj}(1\!\!1)}{2}+\frac{ln\frac{[p^{k+1}]_{j}}{[p^{k+1}]_{k}}}{d_{kj}}\right) \right) + 1\!\!1_{\{r^{k+1}_{l}>0\}} \left( E(R_{0})\right. \right. \nonumber \\&\left. \quad \quad -\gamma _{2}\sum _{i=1}^{m}p_{i}^{k} \sum _{j=1}^{m} p(d_{k+1}= j \mid d_{k}=i)\right. \nonumber \\&\left. \left. \quad \quad \times \max _{k\ne j} Q_{1\!\!1_{-l}}\left( \frac{d_{kj}(1\!\!1_{-l})}{2}+\frac{ln\frac{[p^{k+1}]_{j}}{[p^{k+1}]_{k}}}{d_{kj}}\right) \right) \right\} . \end{aligned}$$
(31)

So

$$\begin{aligned} E[R_{x}\mid p^{k},u_{k},r_{k}]&\le \sum _{l\in C_{(x)}} p^{k}(b) \left\{ 1\!\!1_{\{r^{k+1}_{l}=0\}} \sum _{i=1}^{m}p_{i}^{k}E_{0}(p;i,l)\right. \nonumber \\&\left. \quad +\, 1\!\!1_{\{r^{k+1}_{l}>0\}} \sum _{i=1}^{m}p_{i}^{k}E_{0_{-l}}(p;i,l)\right\} . \end{aligned}$$
(32)

\(\square \)

Theorem 2

An upper bound on the Dec-POP-SSA algorithm at belief state \(b^{0}\) can be obtained as a solution to the following optimization problem with QMC sampling:

$$\begin{aligned} R^{*}&= max \left\{ \sum _{ l=1}^{|C|} \max _{u_{l}}\left\{ \sum _{j=0} ^{u-1} \sum _{ i=1 }^{|C_{(i)}|} [p_{(i)}^{j}(b)]_{i}E_{0_{-l}}(i,l) +\sum _{ i=1 }^{|C_{(i)}|} [p_{(i)}^{u}(b)]_{i}E_{0}(i,l)\right. \right. \nonumber \\&\quad \left. \left. -c\sum _{ i=1} ^{C_{(x)}}\left[ b_{j}p_{(i)}^{u+1}(b)\right] _{i} +\sum _{ i=1 }^{|C_{(i)}|} \left[ p_{(i)}^{u+1}(b)\right] _{i}J^{l}(b_{i})\right\} \omega _{k} ^{l}/\sum _{ l=1}^{|C|}\omega _{k} ^{l}\right\} \!. \end{aligned}$$
(33)

Proof

If the QMC sampling can be perfectly observed after taking the sleeping action, an optimal policy similar to [17] could be found by solving the Bellman equation, which is an equation for the per-sensor problem under the assumption of no future observations. Note that it considers two cases. The first one is for a residual sleep time of 0, the second one is for a residual sleep time great than 0. In each case, the two terms inside the maximization represent the basic reward and the tracking cost. \(b_{i}\) equals to the initiation belief state in cluster \((i)\), and there are \(|C_{(i)}|\) sensors in the cluster \((i)\). So, we have

$$\begin{aligned}&R_{(l)}^{*}=\max _{l \in C_{(i)}}\left\{ 1\!\!1\{r^{k}_{l}=0\} \times \left( \sum _{b}p(b) E_{0}(p;b,l) +\sum _{ i=1} ^{|C_{(i)}|}[p_{(i)}^{1}(b)]_{i}R(b_{i},0)\right) \right. \nonumber \\&\left. \qquad \qquad +\, 1\!\!1\{r^{k}_{l}>0\} \times \left( \sum _{b}p(b)\lambda _{l} E_{0_{-l}}(p;b,l) +\sum _{ i=1} ^{|C_{(i)}|}[p_{(i)}^{1}(b)]_{i}R(b_{i},u_{l})\right) \right\} \!.\qquad \end{aligned}$$
(34)

\(R(b_{i},u)\) can be obtained by recursive iteration until the system reaches \((b_{i},0)\),

$$\begin{aligned} R_{l}(b_{j},u)=\lambda _{l}^{j} E_{0_{-l}}(b_{0};i,l) +\sum _{ i=1} ^{|C_{(i)}|}[b_{j}p_{(i)}(b)]_{i}R^{l}(b_{i},u-1), \end{aligned}$$
(35)

and

$$\begin{aligned} R_{l}(b_{j},1)=\lambda _{l}^{j} E_{0}(b_{0};i,l)-c \sum _{ i=1} ^{|C_{(i)}|}[b_{j}p_{(i)}(b)]_{i} +\sum _{ i=1} ^{|C_{(i)}|}[b_{j}p_{(i)}(b)]_{i}J^{l}(b_{i},0). \end{aligned}$$
(36)

Assume \(u_{l}\) is a controllable behavior, which is the behavior of the sensor \(l\) under a given belief state \(p(b)\). Since an action needs to be made when the sensor wakes up, the actions at each \((b_{i},0)\) need to be defined. It is easy to see that an upper bound cluster \((i)\) can be obtained as a solution of the following maximization problem as Eq. (37).

$$\begin{aligned} R_{(l)}^{*}&= \max _{u_{l}}\left\{ \sum _{j=0} ^{u-1} \sum _{ i=1 }^{|C_{(i)}|} \left[ p_{(i)}^{j}(b)\right] _{i}E_{0_{-l}}(i,l) +\sum _{ i=1 }^{|C_{(i)}|} \left[ p_{(i)}^{u}(b)\right] _{i}E_{0}(i,l)\right. \nonumber \\&\left. -c\sum _{ i=1} ^{C_{(i)}}\left[ b_{j}p_{(i)}^{u+1}(b)\right] _{i} +\sum _{ i=1 }^{|C_{(i)}|} \left[ p_{(i)}^{u+1}(b)\right] _{i}J^{l}(b_{i})\right\} . \end{aligned}$$
(37)

According to aggregation particles of the Dec-POP-SSA algorithm, we assume that there are \(|C|\) clusters in the networks. Then \(R^{*}=\sum _{ l=1}^{|C|}R_{(l)}^{*}\omega _{k} ^{l}/\sum _{ l=1}^{|C|}\omega _{k} ^{i}.\)

Together with Eq. (37), we can determine an upper bound on the total expected reward,

$$\begin{aligned} R^{*}&= \max \left\{ \sum _{ l=1}^{|C|} \max _{u_{l}}\left\{ \sum _{j=0} ^{u-1} \sum _{ i=1 }^{|C_{(i)}|} \left[ p_{(i)}^{j}(b)\right] _{i}E_{0_{-l}}(i,l) +\sum _{ i=1 }^{|C_{(i)}|} \left[ p_{(i)}^{u}(b)\right] _{i}E_{0}(i,l)\right. \right. \nonumber \\&\left. \left. -c\sum _{ i=1} ^{C_{(i)}}\left[ b_{j}p_{(i)}^{u+1}(b)\right] _{i} +\sum _{ i=1 }^{|C_{(i)}|} \left[ p_{(i)}^{u+1}(b)\right] _{i}J^{l}(b_{i})\right\} \omega _{k} ^{l}/\sum _{ l=1}^{|C|}\omega _{k} ^{l}\right\} . \end{aligned}$$
(38)

\(\square \)

Theorem 3

The space complexity of Dec-POP-SSA is \(\mathcal {O}(n\mathcal {T}\mathcal {K})\), time complexity is \(\mathcal {O}(\mathcal {T}^{2})\), where \(n\) is the number of the nodes in the network, \(\mathcal {T}\) is the number of the policy steps, and \(\mathcal {K}\) is the minimum number of sampling.

Proof

First, compute the minimum number of samples. Assume that \(\mathcal {X}\) is the random variable which is in [\(\mathcal {X}_\mathrm{{min}}\),\(\mathcal {X}_\mathrm{{max}}\)], and \(\overline{\mathcal {X}}=E[\mathcal {X}]\) is \(\mathcal {X}\)’s expectation depending upon the QMC method. Suppose that \(\{x_{1},x_{2},\ldots ,x_{\mathcal {K}}\}\) is a sample sequence, whose value is also in [\(\mathcal {X}_\mathrm{{min}}\),\(\mathcal {X}_\mathrm{{max}}\)], and the mean value is \(\widetilde{\mathcal {X}}=\frac{1}{\mathcal {K}} \sum _{ k=1} ^{\mathcal {K}} x_{k}\). In order to keep the samples mean \(\widetilde{\mathcal {X}}\) close to the expectations \(\overline{\mathcal {X}}\), enough samples need to be collected. However, the more samples taken, the higher the communication complexity is. We hope to collect less enough samples, but it can make the average close to expectations. Hoeffding inequality [38] is a very good method, which can be used to calculate the less number of samples. According to the Hoeffding inequality, we have

$$\begin{aligned} P(\widetilde{\mathcal {X}}\le \overline{\mathcal {X}}+\delta )\ge 1-e^{-2\mathcal {K}\delta ^{2}/(\mathcal {X}_\mathrm{{max}}-\mathcal {X}_\mathrm{{min}})^{2}}, \end{aligned}$$
(39)

and

$$\begin{aligned} P(\widetilde{\mathcal {X}}\ge \overline{\mathcal {X}}+\delta )\ge 1-e^{-2\mathcal {K}\delta ^{2}/(\mathcal {X}_\mathrm{{max}}-\mathcal {X}_\mathrm{{min}})^{2}}, \end{aligned}$$
(40)

where \(\delta \) refers to a smaller threshold.

Set another confidence threshold \(\vartheta \). If \(\widetilde{\mathcal {X}}\) wants to meet the condition, the required minimum number of sampling is,

$$\begin{aligned} \mathcal {K}=\frac{(\mathcal {X}_\mathrm{{max}}-\mathcal {X}_\mathrm{{min}})^{2} ln (1/\vartheta )}{2\delta ^{2}}. \end{aligned}$$
(41)

In Dec-POP-SSA algorithm, the number of the policy steps \(\mathcal {T}\) is fixed, select a sensor is needed to improve the pre-generated policy. So for the cluster \((x)\), which has \(|C_{(x)}|\) sensors, the overall space requirements for storage is \(\mathcal {O}(|C_{(x)}|\mathcal {T}\mathcal {K})\). Since there are \(n\) nodes in the network, the space complexity of Dec-POP-SSA is \(\mathcal {O}(n\mathcal {T}\mathcal {K})\).

In the process of iteration, they need QMC samples for many times, the total time needed is \(1+2+\cdots +\mathcal {T}\), that is to say the decision steps are a quadratic function \(\mathcal {O}(\mathcal {T}^{2})\). \(\square \)

6 Numerical analysis

This section gives some simulation results that illustrate the performance of the algorithm Dec-POP-SSA:

  1. 1.

    The QMC sampling can accurately describe the distribution of the state, which can be used for the Dec-POP-SSA to solve the sleep scheduling problem in the GWSNs.

  2. 2.

    The tracking errors by Dec-POP-SSA are lower than QMDP and FCR.

  3. 3.

    The Dec-POP-SSA algorithm has reached the highest reward.

It is assumed that events happen in some location with a multi-peak Gaussian distribution, which is structured as the Formula (42), the results of simulation runs were averaged to compute.

$$\begin{aligned} P(x,y)&= \frac{(1-x)^{2}}{4}e^{-x^{2}-(y+1)^{2}}-\frac{5}{6}\left( \frac{x}{5}-x^{3}-y^{5}\right) e^{-x^{2}-y^{2}}\nonumber \\&\quad - \frac{1}{36}e^{-(x+1)^{2}-y^{2}}+\frac{1}{2}. \end{aligned}$$
(42)

For the nonlinear filtering operation, the value of \(p_{(x)}^{k}(b)\) for each cluster \((x)\) and time slot \(k\) is generated by averaging 50 QMC simulations. The probability distribution of events happened in the designated area is shown in Fig. 4, and one example of the QMC simulations is also shown in this figure as ‘+’ marked, which has 100 samplings one time.

Fig. 4
figure 4

The probability distribution of events and the example of the QMC

In Fig. 5, there are samplings at 10 time slots, 100 QMC points are sampling each time slot, and the results of 50 simulations runs were averaged when plotting the points. From the figure, it is very clear that the value of \(p_{(x)}^{k}(b)\) at every simulation is almost around 0.5, the peaks almost can be got. It is very consistent with multi-peak Gaussian distribution, which is defined by the Formula (42).

Fig. 5
figure 5

The sampling at 10 time steps

In the case of QMDP policies, the value of \(p_{(x)}^{k}(b)\) is generated by 100 MC samplings, for detailed instructions, the distribution of a comparative analysis, Fig. 6 extracts one example of QMC simulation and MC simulation. The plot shows the result of regular Monte Carlo sampling, that is, 100 points selected randomly. Random points tend to form a cluster, over-sampling the unit square in some place, this lead to gaps in other places, where the sample space is not explored at all. It is therefore difficult to achieve the peak value when using the MC sampling. For example, for those 65–100 samplings, almost all of the values were at about 0.5, because of the samplings are too dense, it is difficult to show the distribution of the original belief. The plot shows the result of QMC, that is, 100 points of a two-dimensional Halton sequence. It can be observed that they can reach the peaks under the QMC sampling method. Its distribution on the basis of a very significant multi-peak Gaussian distribution. So, it is visible that the QMC sampling can accurately describe the distribution of the state, which can be used for the Dec-POP-SSA to solve the sleep scheduling problem in the GWSNs.

Fig. 6
figure 6

The QMC and MC sampling at one time

The performance of the Dec-POP-SSA is illustrated in numerical analysis, compared to the QMDP versions of the greedy algorithm and the FCR versions of greedy algorithm [19]. Analogous to [19], the algorithms are executed in the network, where the sensor field is sufficiently dense so that the sensors cover the entire area of interest, that is to say, the algorithms can satisfy the basic connectivity and coverage intensity. The 100 sensors with transmission radius 20 are randomly generated in the \(100\times 100\) m\(^{2}\) network simulation area. The results of 50 simulation runs were averaged, as shown in Fig. 7, the \(x\)-axis is the percent of the wake sensors per unit time, the \(y\)-axis is the percent of the tracking error. The following conclusions can be drawn from it, the lower bound due to the Dec-POP-SSA assumption is tight when only a few sensors are awake; this is because the Dec-POP-SSA assumption incorporates only observation errors and when few sensors are making observations. The FCR policy is the worst-performing policy, since it uses the fixed belief while not real-time. The QMDP policy uses the MC, but it is too dense, so it is near the FCR when the awake nodes increase. On the whole, the tracking errors by Dec-POP-SSA are lower than QMDP and FCR.

Fig. 7
figure 7

The tracking errors by QMDP, FCR and Dec-POP-SSA

The basic reward is set as \(R_{0}=20\), when achieve the basic connectivity and coverage, the thresholds for Dec-POP-SSA are set as \(\varepsilon _{1}=\varepsilon _{2}=\delta =0.1\), and the discount factor are set as \(\gamma _{1}=\gamma _{2}=0.5\) in these experiments. The reward curves are shown in Fig. 8, the Dec-POP-SSA algorithm has reached the highest reward. The key reason lies in that the beliefs distribution is real-time, and adopts the way of collaboration, so as to achieve the approximate optimal benefits. However, other algorithms based on the assumption of a constant a belief distribution, in this hypothetical case, do not take into consideration the real-time and collaborative.

Fig. 8
figure 8

The rewards by QMDP, FCR and Dec-POP-SSA

7 Conclusion

Sleep scheduling is one of the efficient strategies to achieve energy savings in GWSNs. In this paper, we proposed a novel scheme for the sleep scheduling based on Dec-POMDP. Then a sleep scheduling algorithm with online planning (Dec-POP-SSA) with respect to Dec-POMDP is presented. We give the theoretical analysis on the upper bound for Dec-POP-SSA. The numerical experiments’ results illustrate that Dec-POP-SSA receives the highest reward over a multi-peak Gaussian distribution when compared with other approaches.

There are several avenues for energy saving in GWSNs such as sleep scheduling and network coding, both of which are two major techniques. Compared with sleep scheduling which saves energy by turning some redundant sensors into sleep mode, network coding increases energy efficiency and reduces network congestion by combining packets destined for distinct users. How to reconcile these two techniques so that energy saving is achieved even more efficiently is worthwhile to study. Second, we can examine the sleep scheduling problem under the sophisticated attacker in unsafe network.