Keywords

1 Introduction

WSNs have been successfully applied in many fields [1], such as surveillance, environmental monitoring, smart city, and health care, so a large number of sensor nodes will perceive the information of various monitored objects and send this information to the base station and end-users [2]. Meanwhile, the requirements of data collection and data fusion are becoming increasingly higher.

In big data WSNs, the sensed data have a massive similarity and redundancy [3], and the spatial and temporal correlation of data can lead to large amounts of redundant transmissions and the waste of node energy [4]. As a result, clustered WSNs are the prominent network architecture for data reduction strategies [5]. In particular, each cluster uses a specific data compression strategy to perform a series of compression operations on the cluster head [6]. Additionally, a number of compression methods were proposed, such as compressed sensing (CS) [7]. Due to the limited storage space and computation capacity of sensors, CS is inefficient for data collection at cluster head nodes. Then, data prediction was discussed as the more efficient way to obtain data reduction in WSNs [8]. Autoregressive (AR) model was used to calculate the estimation of future perceived data. However, the shortcoming of this model is that communication cost would be very high when the error threshold was set to a small value. Afterward, the GM-OP-ELM scheme, which combined gray model (GM) and optimally pruned extreme learning machine (OP-ELM) was proposed [9], and [10] considered the GM-KRLS scheme, which consisted of GM and kernel recursive least squares (KRLS). It is easy to see that both of these algorithms are based on grey model prediction. Online sequential extreme learning machine (OS-ELM) is known to be a viable solution to time-series prediction [11]. And recently, a new variant of OS-ELM, called online recurrent extreme learning machine (OR-ELM), was proposed [12].

Inspired by the above-mentioned work, a data fusion scheme for WSNs using clustering and OR-ELM prediction algorithms is proposed. First, nodes are clustered by considering both the distance of nodes and trend similarity between the node data series, and strong links between them are established. Second, OR-ELM prediction algorithm is used to reduce sampling data points. Third, prediction-failed nodes find similar nodes through different links, and only one node sends the data in the link. Finally, experimental results show that the proposed data fusion scheme not only can effectively predict the sensor data, but also can reduce spatial and temporal redundant transmissions with low computational cost.

The organization of the rest of this paper is as follows: Sect. 2 introduces the proposed scheme. In Sect. 3, some experimental results are explained. Finally, Sect. 4 summarizes this work.

2 Description of the Proposed Data Fusion Scheme

As shown in Fig. 1, the main steps of the proposed scheme are as follows:

Fig. 1.
figure 1

The flowchart of the proposed scheme.

Step 1: Through the historical data sequence, the similarity of nodes is calculated. Then, a similar node cluster is obtained using the clustering algorithm.

The sensing objects include temperature, humidity, illumination, gas concentration. So the data sequence \(\mathbf{S _i}\) can be represented as an array:

$$\begin{aligned} \mathbf{S _i} = \left[ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{a_i}\left( 1 \right) }&\cdots&{{a_i}\left( t \right) } \end{array}}&{{b_i}\left( 1 \right) }&\cdots&{\begin{array}{*{20}{c}} {{b_i}\left( t \right) }&{\begin{array}{*{20}{c}} \cdots&{{n_i}\left( 1 \right) } \end{array}}&{\begin{array}{*{20}{c}} \cdots&{{n_i}\left( t \right) } \end{array}} \end{array}} \end{array}} \right] , \end{aligned}$$
(1)

where \({a_i}\left( t \right) \), \({b_i}\left( t \right) \), and \({n_i}\left( t \right) \) represent the different sampled data of node i at time slot t. In order to get a better clustering result, all data with the same attributes are normalized to the range [0,1] according to

$$\begin{aligned} {s_t} = \left( {{s_t} - {s_{\min }}} \right) /\left( {{s_{\max }} - {s_{\min }}} \right) , \end{aligned}$$
(2)

where \(s_{\min }\) and \(s_{\max }\) are the maximum and minimum of the sampled data \(s_{t}\) during the period [0, t]. Then, the Euclidean distance between two nodes can be obtained as follows:

$$\begin{aligned} \varPhi \left( {x,y} \right) = \sqrt{{{\left( {{x_1} - {y_1}} \right) }^2} + {{\left( {{x_2} - {y_2}} \right) }^2} + \cdots + {{\left( {{x_n} - {y_n}} \right) }^2}} = \sqrt{\sum \limits _{i = 1}^n {{{\left( {{x_i} - {y_i}} \right) }^2}} }, \end{aligned}$$
(3)

where \(\varPhi \left( {x,y} \right) \) is similarity between nodes x and y, \({x_i}\) and \({y_i}\) are the corrosponding sample data. Next, the k-means method [13] is improved and applied here based on the following rules:

  1. (R1)

    Define the number of randomly selected k groups (clusters).

  2. (R2)

    Calculate the similarity between each point.

  3. (R3)

    Classify the points where the similarity is less than a threshold into a cluster.

  4. (R4)

    Update the cluster center and repeat second and third steps until the center is not changed.

Step 2: Nodes are linked based on the actual geographical distance and the strong links are obtained. If the actual geographic distance of two nodes in cluster is less than a threshold, then they will be linked. As shown in Fig. 2, the solid line are strong links, and the dotted line are weak links.

Fig. 2.
figure 2

Link state.

Step 3: Construct and train the OR-ELM using q training samples and predict future time series. Figure 3 shows the network topology of the OR-ELM.

Fig. 3.
figure 3

The network structure of the OR-ELM.

ELM is a learning algorithm for training an single-hidden-layer feedforward network (SLFN). For N training samples \(\left( {{x_j},{t_j}} \right) \), the output of the ELM with L hidden nodes can be denoted by

$$\begin{aligned} \sum \limits _{i = 1}^L {{\beta _i}g\left( {{\omega _i} \cdot {x_j} + {b_i}} \right) = {t_j},j = 1,2, \cdots ,N}, \end{aligned}$$
(4)

where \({\beta _i}\) is the output weight, \({\omega _i}\) and \({b_i}\) severally denote input weights and bias values of hidden nodes, \(g\left( \cdot \right) \) is a nonlinear activation function. Therefore, it can be further expressed compactly as a matrix:

$$\begin{aligned} \mathbf H \beta = \mathbf T , \end{aligned}$$
(5)

where \(\mathbf H \) denotes the hidden-layer output matrix of the SLFN and the output weights is given as follows:

$$\begin{aligned} \beta = {\mathbf {H}^\dag }\mathbf {T},{\mathbf {H}^\dag } = {\left( {{\mathbf {H}^T}\mathbf {H} + \frac{\mathbf {I}}{C}} \right) ^{ - 1}}{\mathbf {H}^T}, \end{aligned}$$
(6)

where \(\mathbf{H ^\dag }\) is the Moore–Penrose generalized inverse of matrix C and \(\mathbf H \) is a regularization constant. The output weight \({\beta _0}\) for an initial training dataset with \({N_0}\) training samples is calculated as

$$\begin{aligned} {\beta _0} = {\mathbf {P}_0}{\mathbf {H}_0}{\mathbf {T}_0},\quad {\mathbf {P}_0} = {\left( {\mathbf {H}_0^T{\mathbf {H}_0} + \frac{\mathbf {I}}{C}} \right) ^{ - 1}}. \end{aligned}$$
(7)

During the online sequential learning phase, whenever the new input data of the \({N_{k + 1}}\) training sample arrives, the output weight is updated:

$$\begin{aligned} {\beta _{k + 1}} = {\beta _k} + \mathbf{P _{k + 1}}{} \mathbf H _{k + 1}^T\left( {\mathbf{T _{k + 1}} - \mathbf{H _{k + 1}}{\beta _k}} \right) , \end{aligned}$$
(8)
$$\begin{aligned} \mathbf{P _{k + 1}} = \mathbf{P _k} - \mathbf{P _k}{} \mathbf H _{k + 1}^T{\left( \mathbf{I + \mathbf{H _{k + 1}}\mathbf{P _k}{} \mathbf H _{k + 1}^T} \right) ^{ - 1}}\mathbf{H _{k + 1}}\mathbf{P _k}, \end{aligned}$$
(9)

where \(k+1\) is the \((k+1)\)th chunk of input data with k increasing from zero, and \({H_{k + 1}}\) indicates the hidden-layer output for the \((k+1)\)th chunk of input data. The input sample \(x\left( {k + 1} \right) \in {R^{n \times 1}}\) is propagated to the hidden layer so hidden-layer output matrix \(\mathbf H _{k + 1}^i\) can be calculated as follows:

$$\begin{aligned} \mathbf H _{k + 1}^i = g\left( {norm\left( \mathbf{X _{k + 1}^ix\left( {k + 1} \right) } \right) } \right) . \end{aligned}$$
(10)

Note that the norm function is added before the nonlinear activation as a layer normalization procedure to prevent the problem of internal covariate shift. Then, one can calculate output weight \(\beta _{k + 1}^i\) using RLS:

$$\begin{aligned} \beta _{k + 1}^i = \beta _k^i + \mathbf P _{k + 1}^i\mathbf H {_{k + 1}^i}^T\left( {x\left( {k + 1} \right) - \mathbf H _{k + 1}^i\beta _k^i} \right) , \end{aligned}$$
(11)
$$\begin{aligned} \mathbf P _{k + 1}^i = \frac{1}{\lambda }{} \mathbf P _k^i - \mathbf P _k^i\mathbf H {_{k + 1}^i}^T{\left( {{\lambda ^2} + \lambda \mathbf H _{k + 1}^iP_k^i\mathbf H {{_{k + 1}^i}^T}} \right) ^{ - 1}}{} \mathbf H _{k + 1}^i\mathbf P _k^i. \end{aligned}$$
(12)

Set the forgetting factor \(\lambda \in \left( {0,1} \right] \). The transpose of \(\beta _{k + 1}^i\) serves as the input weight of OR-ELM \(\mathbf{X _{k + 1}}\):

$$\begin{aligned} \mathbf{X _{k + 1}} = \beta {_{k + 1}^i}^T. \end{aligned}$$
(13)

Hidden-layer output matrix \(\mathbf H _{k + 1}^h\) is given by

$$\begin{aligned} \mathbf H _{k + 1}^h = g\left( {norm\left( \mathbf{X _{k + 1}^h\mathbf{H _k}} \right) } \right) . \end{aligned}$$
(14)

Then, using RLS calculates output weight \(\beta _{k + 1}^h\) and the transpose of \(\beta _{k + 1}^h\) is used as the hidden weight of OR-ELM \(\mathbf{Y _{k + 1}}\):

$$\begin{aligned} \mathbf{Y _{k + 1}} = \beta {_{k + 1}^h}^T. \end{aligned}$$
(15)

Now, for the \((k+1)\)th input \(x(k+1)\), the OR-ELM hidden-layer output matrix \(\mathbf{H _{k + 1}}\) is calculated as follows:

$$\begin{aligned} \mathbf{H _{k + 1}} = g\left( {norm\left( {\mathbf{X _{k + 1}}x\left( {k + 1} \right) + \mathbf{Y _{k + 1}}\mathbf{H _k}} \right) } \right) . \end{aligned}$$
(16)

Step 4: The base station and sensor nodes both use the OR-ELM to predict the future perceived data, which can guarantee that the data sequence in the sensor nodes and base station are synchronous. If the error between them is below the threshold, then it is unnecessary for the sensor node to transmit data to the base station. When the data prediction target is achieved, the data need to be saved. At the same time, the terminal treats the predicted data as perceived data for the current period.

Step 5: If the prediction fails, data will be transmitted to other nodes in the link and forwarded. Figure 4 displays the nodes connection diagram, and the blue nodes represent nodes that need to transmit data.

Fig. 4.
figure 4

Network topology.

Fig. 5.
figure 5

Prediction results.

3 Experimental Results

In order to evaluate the proposed scheme, some simulations are examined based on Intel Labs’ data collected from Intel Labs during a 1-month period. The real data set is collected from 54 distributed sensors.

The OR-ELM input dimension is configured to 250 steps with a 250-step time delay. The output dimension of the OR-ELM is set to 1. That is, the proposed scheme will continue to accept sequences that take the past 250 steps as input, and as a target output, there are 5 steps of future values. Because 250 data sampling points for each node are used for clustering, the statistical data analysis is also performed for the data after 250 sampling points. Set the forgetting factor \(\lambda = 0.96\), and \(L = 25\) denotes the number of hidden nodes.

Two nodes are randomly selected to prove the performance of the algorithm. Figure 5 shows nodes 9 and 10 time-series prediction of OR-ELM on dataset. Comparing the prediction success rates for nodes 9 and 10 as the error threshold e-max changes from \(0^\circ \) to \(0.22^\circ \). As shown in Fig. 6, when e-max = 0.1, the algorithm achieves about 90% successful prediction for all nodes. In other words, over 90% of the collected sensor data is not to be transmitted to the base station. With the increase of threshold, the algorithm gets a significant improvement in reducing communication. Clustering results prove that nodes 9 and 10 are strong links. The threshold is set to 0.1. So the successful rate of nodes 9 and 10 is 91.96% and 88.76%, respectively. The error of 65.5% of their real data is less than 0.1. The calculation yields that there are only 85 sample points that node 9 predicts failed and then is not similar to node 10. In other words, it senses the 2750 data, only 85 sampling data need to be sent to the terminal. There are also 135 sampling data that are predicted to fail. But they are similar to node 10 and can be replaced by values of node 10. Figure 7 shows the reduced rate of transmitted data. Although the results only report each node, the data reduction of the sensor nodes will result in a reduction of each cluster.

Fig. 6.
figure 6

The comparison of the prediction success rates.

Fig. 7.
figure 7

The comparison of the reduced amount of transmitted data.

4 Conclusions

This paper has improved the performance of a data fusion scheme of a wireless sensor data transmission using clustering and prediction. The OR-ELM has shown higher accuracy than other algorithms, and highly correlated data between nodes have been clustered. Additionally, nodes can compare with connected nodes before transmission to further reduce the amount of data. Finally, experimental results have shown that the proposed scheme improved the prediction accuracy and reduced spatial and temporal redundant transmissions with low computational cost.