1 Introduction

Nowadays, wireless sensor networks (WSNs) can be considered as one of the most important source of big data era. In some applications, such as healthcare services and atmospheric conditions monitoring [21] and commercial flights [22], the volume of data generated by sensors nodes reaches the order of petabytes every day. Moreover, the sensor nodes have a limited energy supply. Therefore, the problems of energy constraint and data redundancy emerge inevitably at the core of WSNs challenges.

To deal with big data generated in WSNs, recent studies [23,24,25] pay a great attention to inter-nodes data correlation techniques and scheduling nodes strategies. The aim of such approaches is to schedule sensors that generate high spatial-temporal correlation into sleep/active mode thus, enhancing the network lifetime. In this paper, we focus on periodic real-time data collection model in WSNs based on clustering architecture. In one hand, periodic model has been used over a wide range of areas [26, 27] where each sensor periodically collects local readings of interest then transmitting them toward the sink. On the other hand, recent works have mainly focused on the clustering architecture as an efficient topology of WSNs that organizes data traffic and improves scalability of the network [28, 29]. Hence, our main goal in this paper is to minimize the huge amount of data generated in clustering-based periodic sensor networks, by searching the spatial and temporal correlation between neighboring nodes. When correlated nodes are detected, we propose two scheduling strategies in order to switch sensors in each cluster into sleep/active modes. The first strategy is based on the set cover problem while the second strategy takes into account the correlation degree and the residual energy of the sensors when scheduling nodes in the cluster.

The remainder of this paper is organized as follows. Section 2 presents the related work on data correlation and scheduling techniques in sensor networks. Section 3 describes our mechanism, based on the Euclidean distance, for searching spatially-temporally correlated nodes. In Section 4, we propose two strategies for scheduling sensors in the network. Simulation results based on real data readings are exposed in Section 5. Finally, we conclude our paper and we provide our directions for future work in Section 6.

2 Related work

In WSNs, a lot of techniques have been proposed in order to explore the spatial-temporal data correlation between sensors [12,13,14]. The main objective of such techniques is to reduce the energy consumption in data collection and minimize the redundancy existing in the network. Recently, the authors in [10, 11] present a comprehensive overview about different spatial-temporal data correlation techniques and sleep scheduling methods proposed in the literature for WSNs.

Some works, such as [15,16,17,18], study the spatial-temporal correlation between sensor readings as the aggregation process of similar data. In [15], the authors propose a spatial-temporal correlation based fault-tolerant event detection scheme, called STFTED, which leverages a two-stage decision fusion and exploiting spatial-temporal correlation of sensor nodes. In [17], the authors propose an architecture for dynamic and distributed data-aware clustering, and the Dynamic Data-aware Firefly-based Clustering (DDFC) protocol to handle spatial similarity between node readings. The DDFC operation takes into account the biological principles of fireflies to ensure distributed synchronization of the clusters’ similar readings aggregations.

In other works, such as [2, 5, 19, 20], the spatial-temporal correlation between sensors have been studied in order to schedule sensors in the network. In [5], the authors propose an Efficient Data Collection Aware of Spatial-Temporal Correlation (EAST) for energy-aware data forwarding in WSNs. In EAST, nodes that detected the same event are dynamically grouped in correlated regions and a representative node is selected at each correlation region for observing the phenomenon. In [2], the authors propose a centralized algorithm design and an optimizing protocol for scheduling the sensors during a specified network lifetime. The objective is to maximize the spatial-temporal coverage by scheduling sensors activity after they have been deployed.

Recently, the authors in [9] propose a spatial-temporal model to extend the network lifetime based on three similarity metrics: Euclidean Distance, Cosine Similarity and Pearson Product-Moment Coefficient (PPMC). Then, they propose a scheduling algorithm for switching correlated sensor nodes to the sleep mode. By performing real experiments, the authors show that PPMC gives the best results, in terms of conserving network energy, compared to other similarity metrics. However, PPMC has several disadvantages: 1) it does not search the similarity at the sensor node level. 2) it does not take into account the residual energy of the sensors when switching them to the sleep mode. 3) it assumes that all the correlated sensors have the same degree of correlation. Hence, aiming to overcome these disadvantages, we propose, in this paper, a new mechanism based on the Euclidean distance for searching inter-node data correlation. Once high correlation between inter-node is noticed, we propose two sleep/active strategies for scheduling sensors in the network. Through simulation, we will show that our mechanism, with the two proposed strategies, can significantly outperform PPMC in terms of saving the sensors energies and extending the network lifetime.

3 Spatial-temporal correlation mechanism

In WSN, sensors are deployed densely in order to monitor some phenomenon which leads to have high spatial-temporal correlation between sensed data. In the section, we propose a new mechanism, based on the Euclidean distance, in order to exploit spatial-temporal correlation bewteen sensed data in WSN.

3.1 Local temporal correlation

In periodic applications, each sensor node collects a vector of readings in each period then it send it to the CH at the end of the period. Mostly, consecutive readings collected from the sensor, in each period, are temporally correlated depending from how the monitored condition varies. We call this correlation a local temporal correlation. Let us consider a vector of readings Ri collected by the sensor Si during period p as follows: \(R_{i} = \left [r_{1}, r_{2}, \dots , r_{\mathcal {T}-1}, r_{\mathcal {T}}\right ]\) where \(\mathcal {T}\) is the total number of readings captured during p. Thus, our objective is to explore temporal correlation between readings in Ri in order to reduce the amount of data readings that need to be transmitted and thus to save the energy in Si. Hence, we propose “LocTmp” function to identify if two consecutive readings rt and rt+1, captured by the sensor Si during a period p, are similar or not. LocTmp function is defined as follows:

Definition 1(L o c T m p function)

We define the LocTmp function between two consecutive readings rt and rt+1 as:

$$LocTmp(r_{t}, r_{t + 1}) = \left\{ \begin{array}{ll} 1& \quad \text{if} \quad |r_{t} - r_{t + 1}| \leq \delta \text{,} \\ 0 & \quad \text{otherwise}. \end{array} \right. $$

where the threshold value δ is a user defined and it depends on the application. Two consecutive readings captured by a sensor are considered similar if and only if their LocTmp function is equal to 1.

In order to maintain a desired data accuracy for the transmitted data, we define the weight of a reading as follows:

Definition 2 (Reading’s weight, wgt(r t)

The weight of a reading rt is defined as the number of equal or similar (according to LocTmp function) consecutive readings, rt−1, previously collected in the same period.

Therefore, after searching local temporal correlation, LocTmp allows Si to transform the initial vector of readings, Ri, to a set of readings, \(R^{\prime }_{i}\), associated to their corresponding weights as follows: \(R^{\prime }_{i}\)=\(\{(r^{\prime }_{1},wgt(r^{\prime }_{1})), (r^{\prime }_{2},wgt(r^{\prime }_{2})), \dots , (r^{\prime }_{k},wgt(r^{\prime }_{k}))\}\), where \(k \leq \mathcal {T}\).

Based on the set \(R^{\prime }_{i}\), we provide the following definitions:

Definition 3 (Cardinality of the set \(R^{\prime }_{i}\), \(|R^{\prime }_{i}|\))

The cardinality of the set \(R^{\prime }_{i}\) is equal to the number of elements in \(R^{\prime }_{i}\), i.e. \(|R^{\prime }_{i}|=k\).

Definition 4 (Weighted Cardinality of the set \(R^{\prime }_{i}\), \(wgt_{c}(R^{\prime }_{i})\))

The weighted cardinality of the set \(R^{\prime }_{i}\) is equal to the sum of all reading weights in \(R^{\prime }_{i}\) as follows: \(wgt_{c}(R^{\prime }_{i})= {\sum }_{j = 1}^{|R^{\prime }_{i}|} wgt(r^{\prime }_{j})\) = \(\mathcal {T}\), where \(r^{\prime }_{j} \in R^{\prime }_{i}\).

3.2 Spatial correlation between sensors

In WSNs, data sensed by the sensor nodes are spatially correlated due to their densely deployment. Hence, it is important to exploit the spatial correlation of data in sensor network in order to reduce the energy consumption in sensors, while conserving the integrity of these data.

Mostly, a sensor node Si is represented by its position (xi, yi), its sensing range (Sr) and its transmission range (Tr). In this paper, we assume that all sensor nodes has the same sensing and transmission range. Then, we use the Euclidean distance (Eg) to calculate the geographical distance between two nodes Si and Sj as follows:

$$E_{g}(S_{i}, S_{j})=\sqrt{(x_{i}-x_{j})^{2} + (y_{i}-y_{j})^{2}} $$

After that, we define the neighboring nodes of Si:

Definition 5

(neighbor): Sj is a neighbor node of Si if the Euclidean distance between Si and Sj is less than the twice of sensing range as follows:

$$E_{g}(S_{i}, S_{j}) \leq 2\times S_{r} $$

Finally, we assume that Vi is the set of neighbors of Si.

Generally, there are three main categories to search the spatial correlation between neighboring sensors. The first category, such proposed in [1, 2], exploits the overlap area between two sensor nodes (Fig. 1a). The second category, such in [3], calculates the spatial correlation based on the distance overlap between the sensors (Fig. 1b). The last category, such as [4], defines a number of primary points in the circle disk of the sensing range, then it calculates the number of points in the common area between the two sensors (Fig. 1c).

Fig. 1
figure 1

Spatial correlation techniques between two sensors

In this paper, we focus on the second category of spatial correlation which is simple and more flexible compared to other categories. However, in order to make the constraint for the spatial correlation between sensors more difficult to satisfy, we define a spatial correlation threshold called Csp. Therefore, the spatial correlation between two sensors is defined as follows:

Definition 6

(Spatial correlation between two sensors): Two given sensors Si and Sj, where Sj is a neighbor node of Si (i.e. SjVi), are spatially correlated if and only if:

$$ E_{g}(S_{i},S_{j}) \leq C_{sp} $$
(1)

where Csp is a threshold determined by the application and it takes values in [0, 2 × Sr]. Then, we assume that \(V^{\prime }_{i}\) is the set of all spatially correlated nodes with Si.

Based on the Definition 6, we can dynamically change the threshold Csp depending on the criticality of the monitored environmental; if the phenomenon is critical, the decision makers can decrease Csp in order to decrease the number of spatially correlated nodes for each node, i.e. \(|V^{\prime }_{i}|\) decreases; else, the decision makers can increase the threshold Csp when the phenomenon is less critical.

3.3 Temporal correlation between sensors

In addition to the local temporal correlation in each sensor, the readings collected by nearby sensor nodes can be also temporally correlated. The temporal correlation among sensor nodes is to find out the sensors that collect similar readings at the same period time. Similarity metrics, such as Euclidean and Cosine distances, is one of the methods which can be used to identify sensor nodes that are temporally correlated. These metrics are generally used at the CHs level. Once high temporal correlation between two sensors is found, the CH should schedule these sensors in order to remove the redundancy in the network. In this paper, we focus on the Euclidean distance which is widely used in various domains.

In mathematics, the Euclidean distance is the straight line distance between two vectors of data. Let us first consider two data sets \(R^{\prime }_{i}\) and \(R^{\prime }_{j}\) generated by the two sensor nodes Si and Sj respectively in the same period p. Then, in order to compute the Euclidean distance between \(R^{\prime }_{i}\) and \(R^{\prime }_{j}\), we must retransform the set \(R^{\prime }_{i}\) (resp. \(R^{\prime }_{j}\)) to a vector as follows:

$$R^{\prime}_{i}=\left[\underbrace{r^{\prime}_{1}, \dots, r^{\prime}_{1}}_{wgt(r^{\prime}_{1})\text{ times}}, \underbrace{r^{\prime}_{2}, \dots, r^{\prime}_{2}}_{wgt(r^{\prime}_{2})\text{ times}}, \dots, \underbrace{r^{\prime}_{k}, \dots, r^{\prime}_{k}}_{wgt(r^{\prime}_{k}) \text{ times}}\right] $$

where \(|R^{\prime }_{i}|=|R^{\prime }_{j}|= \mathcal {T}\).

Finally, we can calculate the Euclidean distance between the two vectors \(R^{\prime }_{i}\) and \(R^{\prime }_{j}\) based on the following equation:

$$E_{d}(R^{\prime}_{i}, R^{\prime}_{j})\,=\,\sqrt{{\sum}_{k = 1}^{\mathcal{T}} (r^{\prime}_{i_{k}}-r^{\prime}_{j_{k}})^{2}} \text {\,\, where} r^{\prime}_{i_{k}} \!\in\! R^{\prime}_{i} \text {\, and } r^{\prime}_{j_{k}} \!\in\! R^{\prime}_{j} $$

3.3.1 Distance normalization

The normalization of data is an essential process when using the distance functions. The objective of the normalization process is to scale all vectors of data to have the same variation then, to perform an exact comparison among these vectors. In this paper, we use Gaussian normalization to normalize data generated by the sensors. First, we calculate the Euclidean distance for each pair of data vectors in the network:

$$\mathbb{E}_{d}=\{E_{d}(R^{\prime}_{1},R^{\prime}_{2}), E_{d}(R^{\prime}_{1},R^{\prime}_{3}),\dots, E_{d}(R^{\prime}_{N-1},R^{\prime}_{N})\} $$

where N is the total number of sensors. Then, we can apply the Gaussian normalization using the following formula:

$$ E^{\prime}_{d}(R^{\prime}_{i},R^{\prime}_{j})=\frac{E_{d}(R^{\prime}_{i},R^{\prime}_{j})-\overline{Y}}{6\times \sigma}+\frac{1}{2} $$
(2)

where \(\overline {Y}\) is the mean of all distances and σ is the standard deviation of pairwise distance over all data.

Thus, \(R^{\prime }_{i}\) and \(R^{\prime }_{j}\) are said to be redundant if \(E^{\prime }_{d}(R^{\prime }_{i},R^{\prime }_{j}) \leq C_{tp}\), where \(C_{tp}\) is a user defined threshold for the temporal correlation.

3.4 Spatial-temporal correlation between sensors

In PSNs, the spatial-temporal correlation happens when two nodes that are close geographically take similar readings in a period [5]. In this section, our objective is to search all pairs of sensors that are spatially-temporally correlated then to switch, in a later time, some sensors to the sleep mode in order to reduce redundant sensing and communication.

Based on Eqs. 1 and 2, we say that two sensors Si and Sj, collecting the set of readings \(R^{\prime }_{i}\) and \(R^{\prime }_{j}\) respectively, are spatially-temporally correlated at the period p if and only if:

$$ E_{g}(S_{i},S_{j})\times E^{\prime}_{d}(R^{\prime}_{i}, R^{\prime}_{j}) \leq C_{sptp} $$
(3)

where \(C_{sptp}=C_{sp}\times C_{tp}\) and \(E_{g}(S_{i},S_{j}) \leq C_{sp}\) and \(E^{\prime }_{d}(R^{\prime }_{i}, R^{\prime }_{j}) \leq C_{tp}\). Algorithm 1 describes our technique to find pairs of sensors that are spatially-temporally correlated. The CH searches which neighbors of each sensor Si are spatially (line 6) and temporally (line 8) correlated with Si.

3.5 Selection of thresholds

Obviously, the efficiency of our technique is highly related to the selection of thresholds δ, Csp and Ctp. In a realistic application, these thresholds allow us to define an appropriate level of service. For instance, increasing the values of such thresholds leads to decrease the accuracy of the sent information and reduces more the size of data sent. On the other hand, the lowest the values of these thresholds are, the better relevant decisions and analysis could be made but the LocTmp function will not be efficient in reducing the amount of sent data. Therefore, selecting the appropriate values of thresholds is very essential in our technique. Indeed, we believe that these values should be determined by the decision makers or experts depending on the application requirements. For instance, in disaster monitoring applications, like volcano or seismic, thresholds must be lower than weather monitoring applications. Therefore, these parameters are based on the application criticality and the studied phenomenon. On the other hand, these parameters can also be adapted online and using the context aware sensing.

figure c

4 Sleep scheduling strategies

After searching all pairs of spatially-temporally correlated sensors into a cluster, we propose, in this section, two scheduling strategies that allow sensors to work alternatively. The first strategy is based on the set cover problem while the second takes into account the correlation degree and the residual energy of sensors when searching the set of active sensors. In each strategy, a set of sensor nodes is selected, based on some criteria, to collect data in the network while switching other sensors into sleep mode.

4.1 Set cover (SC) strategy

The first strategy for scheduling sensor nodes is based on the Set Cover (SC) problem. In general, the SC problem consists in organizing sensor nodes into mutually exclusive subsets that are activated successively, where each subset ensures the coverage of the area of interest. Some real-world applications of SC problem include railway and airline crew scheduling, network discovery and phasor measurement unit placement [6]. In our case of PSNs, we apply the SC over the set of correlated sensors in order to divide all sensors into disjoint sensor subsets while every subset being able to completely cover the whole area of interest.

The SC problem can be formally defined in this paper as follows:

Given a set of N sensors \(\mathbb {S}=\{S_{1}, S_{2}, \dots , S_{N}\}\) and the list \(\mathbb {L}=\{(S_{i},S_{j})/ E_{g}(S_{i},S_{j})\times E^{\prime }_{d}(R^{\prime }_{i}, R^{\prime }_{j}) \leq C_{sptp}\}\) of all pairwise spatially-temporally correlated sensors. The goal of SC is to find the list 𝔻 which contains all the subsets 𝕏 ⊆ 𝕊, where each 𝕏 will cover, in terms of spatial-temporal correlation, all the sensors in 𝕊. In other words, organizing sensor nodes into mutually exclusive subsets 𝕏 where 𝕏 ensures the coverage of the whole area of interest. Then, these subsets are activated successively.

Theoretically, the SC can be formulated as a binary integer programming problem as follows:

$$\begin{array}{llll} \textit{Minimize} & \displaystyle\sum\limits_{j = 1}^{N} &S_{j} &\\ \textit{Subject to}& \displaystyle\sum\limits_{j = 1}^{N} &a_{ij}S_{j} \geq 1, &i=\{1, \dots, N\}\\ & &S_{j} \in \{0,1\}, &j=\{1,\dots, N\} \end{array} $$

Our startegy operates into rounds, where each round is composed of |𝔻| successive periods where |𝔻| is the number of disjoint subsets found after applying the set cover technique. At each period only one subset is activated while the remaining nodes go to sleep mode. After each round, where all the subsets have been activated for one period and successively, the list of subsets will be updated based on the new spatio-temporal correlation between nodes.

Illustrative example

we consider a set of 6 sensors: \(\mathbb {S}=\{S_{1}, S_{2}, S_{3}, S_{4}, S_{5}, S_{6}\}\), with the list of spatially-temporally correlated sensors: \(\mathbb {L}= \{(S_{1},S_{2}), (S_{1},S_{3}), (S_{1},S_{4}), (S_{1},S_{5}), (S_{2},S_{6}), (S_{3},S_{4}), (S_{3},S_{6}), (S_{4},S_{5})\}\). This leads to the following mathematically formulation of SC problem:

$$\begin{array}{lllllllllllll} \textit{Minimize:} & S_{1} & + & S_{2} & + & S_{3} & + & S_{4} & + & S_{5} & + & S_{6} & \\ \textit{Subject to:} & S_{1} & + & S_{2} & + & S_{3} & + & S_{4} & + & S_{5} & & & \geq 1 \\ & S_{1} & + & S_{2} & & & & & & & + & S_{6} & \geq 1 \\ & S_{1} & & & + & S_{3} & + & S_{4} & & & + & S_{6} & \geq 1 \\ & S_{1} & & & + & S_{3} & + & S_{4} & + & S_{5} & & & \geq 1 \\ & S_{1} & & & & & + & S_{4} & + & S_{5} & & & \geq 1 \\ & & & S_{2} & + & S_{3} & & & & & + & S_{6} & \geq 1 \end{array} $$

By applying the SC problem [7], we can find two feasible scheduling where evey sensor \(S_{i}\) equals to 1 (in active mode) in at most one scheduling:Footnote 1

  • Scheduling 1: \(S_{1}\,=\,S_{6}\,=\,1\) and \(S_{2}=S_{3}\,=\,S_{4}=S_{5}\,=\,0\).

  • Scheduling 2: \(S_{2}\,=\,S_{4}\,=\,1\) and \(S_{1}=S_{3}=S_{5}\,=\,S_{6}\,=\,0\).

Therefore, we can divide 𝕊 into two disjoint subsets of sensors as follows: \(\mathbb {L}=\{L_{1}=\{S_{1},S_{6}\}, L_{2}=\{S_{2}, S_{4}\}\}\). Consequently, the current round will consist, by applying our SC strategy, in three periods where in each period the sensors in one subset \(L_{i}\) wil be active. Otherwise, all the sensors are active in the first period. Figure 2 shows the active sensors in each period in the Round1.

Fig. 2
figure 2

Active sensors during periods in the round

At the begining of each perio the CH is reponsible to send notifications to activate or switch off the nodes. Finally, we note that the number of periods during a round can be fixed based on the temporal variations and observations.

4.2 Correlation degree and residual energy (CDRE) strategy

In this section, we propose a second strategy for scheduling in sensor networks based on spatio-temporal correlation while taking into account the residual energy. We call this strategy Correlation Degree and Residual Energy (CDRE) strategy.

The CDRE strategy also operates into rounds where each round is fixed to two periods. In the first period of each round, the CH searches the set of sensors to be active in the second period, based on the CDRE strategy. For this we consider the following notations:

  • The list of spatially-temporally correlated sensors with their correlation degrees: \(\mathbb {L}=\{\left ((S_{i},S_{j}), C_{ij}(S_{i},S_{j})\right ) such\, that C_{ij}(S_{i},S_{j})=E_{g}(S_{i},S_{j})\times E^{\prime }_{d}(R^{\prime }_{i},R^{\prime }_{j}) and\,\, C_{ij}(S_{i},S_{j}) < C_{sptp}\}\).

  • \(E_{r_{i}}\), the residual energy of the sensor Si.

The CDRE strategy can be expressed using the Algorithm 2. First, we order the pairs of sensors according to the increasing order of their spatial-temporal correlation degree. Therefore, we start with the pair of sensors having the highest correlation degree (line 2). Then for the first period of each round, for each pair of nodes (Si, Sj), we selct the sensor which has the higher residual energy to be an active sensor in the second period, and the second one will go to sleep mode (lines 4-14). The objective of this part of the algorithm is to select from correlated nodes the nodes to be activated in the next period and having the highest remaining energy. In a second step, for the nodes that do not have any correlation with other sensors, they must be in active mode for next rounds (lines 15-19). After that, all the nodes in set 𝔼 will send their readings of the first period to the sink and be activated for the next period, while the others do not sen d their readings and go to sleep node for the next period. The objective here is to remove the redundancy among the data sent to the sink in the first period, contrarily to the SC strategy which sends all the readings sets.

figure d

Illustrative example

Recall the sensors S1 to S6 in the set 𝕊 in the example above with the ordered list of correlated pairs degree as follows: \(\mathbb {L}= \{((S_{1},S_{3}), C_{1,3}= 0.5), ((S_{4},S_{5}), C_{4,5}= 0.9)\), \(((S_{3},S_{4}), C_{3,4}= 1.8),((S_{3},S_{6})\), \(C_{3,6}= 2.1), ((S_{1},S_{5})\), \(C_{1,5}= 2.5), ((S_{1},S_{2}), C_{1,2}= 2.9), ((S_{1},S_{4}), C_{1,4}= 3.3)\), \(((S_{2},S_{6}), C_{2,6}= 3.5)\}\). Then, we consider that the sensors have the following residual energies at the beginning of the round i: \(E_{r_{1}}= 8.1\, mJ\), \(E_{r_{2}}= 8.3\, mJ\), \(E_{r_{3}}= 7.6\, mJ\), \(E_{r_{4}}= 6.9\, mJ\), \(E_{r_{5}}= 7.8\, mJ\), \(E_{r_{6}}= 7.9\, mJ\).

  • Step 1: We start by the correlated pair (S1, S3). Since S1 has more energy than S3, S3 will be switched to the sleep mode in the next period while S1 will be added to the list of active sensors: \(\mathbb {E}=\{S_{1}\}\). Then, we remove the pairs of sensors that contains S3, i.e. (S3, S4) and (S3, S6). The remaining elements in \(\mathbb {L}=\left \{\left ((S_{4},S_{5}), C_{4,5}= 0.9\right )\right .\), \(\left ((S_{1},S_{5}), C_{1,5}= 2.5\right ), \left ((S_{1},S_{2}), C_{2,6}= 2.9\right )\), \(\left . \left ((S_{1},S_{4}), C_{1,4}= 3.3\right ), \left ((S_{2},S_{6}), C_{1,2}= 3.5\right )\right \}\).

  • Step 2: The first element in 𝕃, i.e. (S4, S5), is treated similarly to (S1, S3): we add S5 to the list of active sensors and we switch S4 to the sleep mode then, we remove all elements that contains S4 from 𝕃. Therefore, \(\mathbb {E}=\{S_{1}, S_{5}\}\) and \(\mathbb {L}=\left \{\left ((S_{1},S_{5}), C_{1,5}= 2.5\right ), \left ((S_{1},S_{2}), C_{2,6}= 2.9\right )\right .\), \(\left . \left ((S_{2},S_{6}), C_{1,2}= 3.5\right )\right \}\).

  • Step 3: Since S1 and S5 are both in 𝔼, we remove the pair (S1, S5) from 𝕃 because they will be both in active mode. Therefore, \(\mathbb {L}=\left \{\left ((S_{1},S_{2}),C_{2,6}= 2.9\right ), \left ((S_{2},S_{6}), C_{1,2}= 3.5\right )\right \}\).

  • Step 4: Independent from residual energies of the sensors S1 and S2, S2 should be switched to the sleep mode because S1 will be considered as active sensor in the next period. Hence, we remove elements from 𝕃 that contains S2: 𝕃 = {}.

  • Step 5: We add the sensor S6 to the set 𝔼 since it does not have any correlated sensor in 𝔼.

Finally, the set of active sensors and the readings sets sent from the CH to the sink, at each period in the round i, are shown in Fig. 3a and b respectively. In the first period, all the sensors are active while the CH only sends the sets which are not redundant, i.e. sensors ∈ 𝔼. On the other hand, all readings sets coming from the active sensors will be send to the sink in the second period in the round.

Fig. 3
figure 3

Illustrative example of the actives sensors and their readings sets during a round

5 Simulation results

In this section, we look at the performance of our spatial-temporal correlation mechanism under the two proposed scheduling strategies. In our simulations, we implemented both strategies based on a Java based simulator. We ran the simulator based on real sensor readings of temperature collected by 46 sensors and provided by the Intel Berkeley Research lab [8]. Figure 4 shows a map of the placement of sensors in the lab. The sensors are Mica2Dot with weather boards that collect temperature values once every 31 seconds. We assume that the network is divided into two clusters, which have CH1 and CH2 as cluster-heads respectively, where the sensors send their data periodically to their appropriate cluster-head. We compared our results to those of PPMC proposed in [9]. Table 1 shows the parameters used in our simulations.

Fig. 4
figure 4

Distribution of sensors and CHs in the Intel Laboratory

Table 1 Simulation environment

5.1 Performance evaluation at sensor node

In this section, we evaluate the performance of our mechanism with SC and CDRE strategies at the sensor nodes level, compared to the PPMC [9] and the naïve method (i.e. the classic method where all readings collected by the sensors are sent to the sink without any processing). Indeed, we present only the results for one cluster, i.e. the second cluster with CH2, in the case when the performance metric gives similar results for the two clusters. In addition, the results for each metric shown in the next figures represent the average of all sensors in each cluster. At this level, we have considered three performance metrics.

5.1.1 Percentage of data readings sent from each sensor to its CH

In this section, our objective is to show how our mechanism can decrease the data readings collected by each sensor node and then sent to the CH2. Figure 5 shows the percentage of data collected, then sent, by each sensor node when varying one parameter each time and fixing the others as shown in Fig. 5a to e. The obtained results show that PPMC can reduce from 25% to 33% the data sent to the CH2, while, our mechanism with SC and CDRE strategies can reduce, respectively, up to 93% and 90% the data sent, compared to the naïve technique which always sends all data collected (i.e. 100%) . This means that our approach can effectively eliminate the redundancy in data collection while searching all sensors that generate spatially-temporally correlated data. Furthermore, we can also notice that the SC strategy gives better results, in terms of reducing the data sent by each sensor, than CDRE strategy in all cases.

Fig. 5
figure 5

Percentage of data readings sent from each sensor to the CH2

Several observations can be made based on the results in Fig. 5:

  • By increasing the threshold δ in Fig. 5a, each sensor can reduce, using the SC and CDRE strategies, up to 90% and 87% respectively the readings sent to the CH2 compared to PPMC. These results are obtained due to the fact that LocTmp will find more similar readings when δ increases.

  • By increasing its sensing range as shown in Fig. 5b, each sensor sends less readings to the CH2 using the two proposed strategies. For instance, when Sr increases from 5 to 20, a sensor node decreases its readings sent from 11.4% to 8.4% using SC strategy. This happens because, when Sr increases, each sensor will have more neighboring, thus correlated, sensors. Consequently, more sensors will be switched to the sleep mode, thus, decreasing the percentage of the collected and sent readings.

  • By increasing \(\mathcal {T}\) from 200 to 1000 in Fig. 5c, the percentage of readings sent decreases using SC and CDRE strategies while it increases using PPMC. The reason for this is that the δ threshold used in LocTmp in the two strategies which finds, then eliminates, more redundancy when \(\mathcal {T}\) increases. Contrarily, PPMC does not apply any processing on the collected data which increases the readings sent to the CH2 when \(\mathcal {T}\) increases.

  • By decreasing the spatial correlation threshold (Csp) in Fig. 5d, the percentage of readings sent to the CH2 increases in the three approach, i.e. SC, CDRE and PPMC. This result is logical since we make the constraint for the neighboring sensors more difficult to satisfy (see definition 6). Consequently, the number of active sensors in each period will tend to increase. It is also important to notice that our strategies reduce, in all cases, the readings sent to CH2 as compared to those sent using PPMC.

  • By increasing the temporal correlation threshold (Ctp) in Fig. 5e, SC and CDRE strategies allow each sensor node to decrease its data readings sent to the CH2. This is because, the Euclidean distance between sets of readings will be more easily satisfied therefore, more sensors will be switched to the sleep mode.

5.1.2 Lifetime of the sensor node

In this section, our objective is to study the energy consumption at the sensor nodes level. Therefore, we fixed the initial energy for all sensor nodes to \(E_{i}\). Then, we applied our strategies, PPMC and Naïve approaches while varying, each times, one parameter and fixing the others as done in Fig. 5. Figure 6 shows the lifetime of each sensor in terms of number of periods in which the sensor is operational, i.e. its residual energy is positive. The obtained results show clearly that our mechanism, with the proposed strategies, can efficiently reduce the energy consumption of the sensor and extend its lifetime. This is because, our mechanism eliminates the redundancy among collected data and reduces the readings sent to the CH (see Fig. 5). Although the PPMC can extend, in the best case, the lifetime of a sensor by two times compared to the Naïve approach, our strategies significantly outperform the results of PPMC. We can also notice that, the SC strategy gives better results in terms of keeping the sensor node operating for long time compared to CDRE strategy.

Fig. 6
figure 6

Lifetime of each sensor in the second cluster (CH2)

In WSNs, the energy consumption in the sensor node is proportionally to the amount of data sent by the sensor. Consequently, when the sensor sends more data the CH, its energy will be more consumed and vice versa. Hence, the observations made based on the results of Fig. 5 can be similarly made for the energy consumption in the sensor in the Fig. 6. Table 2 shows how many times the sensor node can extend, using our strategies, its lifetime in the worst and the best cases by fixing one parameter as shown in Fig. 6a to e, compared to PPMC and Naïve approaches.

Table 2 Lifetime comparisons between our strategies, PPMC and Naïve approaches. (Worst case → Best case)

5.1.3 Variation of the state and the energy of the sensor during periods

In this section, we show an example of a sensor activitiy variation during periods by applying our strategies, PPMC and Naïve approaches. We take the sensor that has an id equals to 35 located in the second cluster, then we study the variation of its state, i.e. active or sleep, and its residual energy during the periods, for some fixed parameters shown in Fig. 7. Based on the results of Fig. 7a, we can see that the state of the sensor varies, when applying our strategies, from 1 (i.e. active mode) to 0 (i.e. sleep mode) during the periods more dynamically than with other techniques. Our strategies confirm also the efficient reduction of the redundancy between the sensors correlated to the sensor ‘35’ by switching it to the sleep mode more often than with the other techniques. On the other hand, Fig. 7b shows how the residual energy of the sensor varies depending on the state of the sensor; if the sensor is in active mode, its residual energy decreases in order to collect the data and send it to the CH; else, it remains fixed until the next period. We can also observe that the residual energy of the sensor can remain fixed using the SC strategy in many successive periods, i.e. from periods 18 to 22 in Fig. 7b for example. This happens because the number of periods in each round changes dynamically using the SC strategy where the sensor can be active at most in two periods in a round.

Fig. 7
figure 7

Variation of sensor activity during periods, Sensor id = 35, \(\mathcal {T}= 500\), Sr = 15, δ = 0.07, Csp = 2 × Sr, Ctp = 0.45

5.2 Performance evaluation at CH nodes

In this section, we evaluate the performance of our approach at the CHs level. We have taken four performance metrics.

5.2.1 Data accuracy

Scheduling sensor nodes in the network without losing the integrity of the information is an important challenge for the WSN. Data accuracy usually represents the “data loss rate” measure. It is an evaluation of the readings taken by the sensor nodes whose values (or similar values) do not reach the sink. Figure 8 shows the results of data accuracy for SC, CDRE and PPMC for different values of parameters considered in our simulation. We can observe that our strategies always give better results for data accuracy compared to PPMC. This is because, the Pearson coefficient used in PPMC calculates the distance between two data sets based on the summation of readings while the Euclidean distance, used in our strategies, calculates the distance between every two readings in the data sets. This makes the loss of data in our strategies less than that in PPMC. We can also notice that the results of data accuracy using CDRE strategy is better, in the most cases, than those obtained using SC strategy. The reason for that is the sensor sends, using the two strategies, at most two data sets in a round while the round in SC contains more periods than that in CDRE (see illustrative examples for SC and CDRE strategies).

Fig. 8
figure 8

Data accuracy at the CH2

In general, the data accuracy dependent on the percentage of data sent by the sensors (see results in Fig. 5) and on the number of active sensors during the periods; when the data sent to the sink or the number of active sensors increase, the data accuracy increases. Therefore, the following observations can be made based on the results of Fig. 8: (1) data loss measure increases when the sensing range of the sensor (Sr) or the temporal correlation threshold (Ctp) increase (Fig. 8b and e). (2) the data accuracy increases when the number of collected readings during a period (\(\mathcal {T}\)) increases or the spatial correlation threshold (Csp) decreases (Fig. 8c and d).

5.2.2 Variation of periods number during rounds

In this section, we show how the number of periods changes, using our proposed strategies, after each round for the two clusters in the network, for some fixed parameters. Using SC strategy, the CH calculates, at the beginning of each round, the maximum number of periods for the current round based on the set covering problem. Otherwise, the number of periods is always equal to 2 for each round using CDRE strategy. The obtained results of the two clusters, represented by their cluster-heads CH1 and CH2 respectively, are shown in Fig. 9a and b respectively. While each round always consists of two periods using CDRE, the number of periods dynamically varies in each round using SC as shown in the figures. We can also observe that: (1) the round can up to 7 periods using SC strategy. This reflect the high level of redundancy existing in the network where SC can efficiently eliminate this redundancy. (2) the sensors in the first cluster are more spatially-temporally correlated compared to those in the second cluster. This leads to extend, using the two strategies, the lifetime of the first cluster more than that of the second cluster.

Fig. 9
figure 9

Variation of periods number in each round, \(\mathcal {T}= 500\), Sr = 15, δ = 0.07, Csp = 2 × Sr, Ctp = 0.45

5.2.3 Variation of active sensors number during periods

In this section, our main goal is to show how our strategies are able to schedule the activities of the sensor nodes for the two clusters. Figure 10 shows the number of active sensors in each cluster and in each period using SC and CDRE strategies, for the fixed parameters shown in the figure. The number of active sensors can affect the lifetime of the network, the data latency and the coverage of the monitored area. As we can see, each strategy successfully schedules the sensor nodes in each cluster dynamically after each period according to its own scheduling mechanism. We can notice that, the SC strategy reduces the number of active sensors, in each cluster, at each period to the minimum while the CDRE strategy selects the set of active sensors that balance the energy distribution in each cluster. Consequently, the obtained results confirm the proper behavior of our strategies.

Fig. 10
figure 10

Variation of active sensors number during each period, \(\mathcal {T}= 500\), Sr = 15, δ = 0.07, Csp = 2 × Sr, Ctp = 0.45

5.2.4 Coverage variation during periods

Conserving the network energy while preserving the maximal coverage of the region of interest is an important challenge in WSNs. In Fig. 11, we show how much of the area of each cluster is covered after each period by applying our strategies. The sensing range of a sensor is varied from 10 in Fig. 11a and b to 15 in Fig. 11c and d, while the other parameters remain fixed. The obtained results show that the two proposed strategies provide sufficient coverage for the clusters during each period. Therefore, we can consider that our mechanism with the two proposed strategies can efficiently extend the network lifetime while preserving the integrity of data and the coverage of the observed area.

Fig. 11
figure 11

Coverage ratio for each cluster, \(\mathcal {T}= 500\), δ = 0.07, Csp = 2 × Sr, Ctp = 0.45

Based on the results in Fig. 11, several observations can be made

  • the CDRE strategy provide more coverage for the two clusters compared to SC strategy. This is because, the number of active sensors in each period using CDRE strategy is greater than that using SC strategy.

  • the coverage ratio for each cluster increases when the sensor sensing range increases.

5.3 Further discussions

In this section, we give further consideration to our proposed mechanism. We compare the obtained results for both strategies SC and CDRE. We give some directions to which strategy to choose and under which conditions and circumstances of the application.

From the sensor lifetime point of view, both strategies SC and CDRE significantly improve the lifetime of the sensor (Fig. 6). However, SC allows sensor to extend more its lifetime, from 7% to 45%, compared to CDRE. Therefore, if the application needs to conserve the energy and extend the network lifetime as long as possible, SC strategy is more suitable.

From the data accuracy point of view, CDRE can save, in the most cases, the integrity of the collected data more than SC. This is because the number of active sensors in each period using CDRE is greater than that in SC, which increases the accuracy of the data sent to the sink. Consequently, when the priority of the application is to ensure a high level of data accuracy, CDRE is more suitable.

From the coverage of the interest area point of view, CDRE can practically cover the whole monitored area during all periods of the network lifetime, while SC can ensure a satisfactory coverage, i.e. more than 70% in the most cases, of the network area. Hence, if the application does not permit flexibility regarding coverage of the network, CDRE is more suitable.

6 Conclusion & future work

In this paper, we proposed an efficient mechanism in order to search the spatial and temporal correlation between data collected by the sensors in a periodic sensor network. Then, in order to schedule sensors to work alternatively, we proposed two scheduling strategies in order to switch the sensors into sleep/active mode during the periods. The first strategy, called SC, is based on the set cover problem while the second strategy, called CDRE, takes into account the correlation degree and the residual energy of the sensors when scheduling the network. We demonstrated through simulation on real data readings the efficiency of our mechanism, under the two proposed strategies, in sensor networks in terms of extending network lifetime while conserving the quality of the collected data and the coverage of the monitored area.

As future work, we will adapt our proposed mechanism to take into consideration reactive periodic sensor network, where sensor nodes operate with different sampling rates. In periodic applications the dynamics of the monitored condition or process can slow down or speed up; the sensor node can further save energy by adapting its sampling rates to the changing dynamics of the condition or process.