1 Introduction

Wireless sensor networks (WSNs) play an indispensable role in many applications due to its low cost, easy deployment, dynamic networking capability, and easy expansion. WSNs are utilized in environmental monitoring, military, and real-time target tracking systems among others [1, 2]. WSNs are composed of many small volumes and low power sensor nodes which are deployed in the monitored area to form a network which facilitates wireless communication. The main source of power to a sensor node is a limited and non-rechargeable battery, so it is crucial to design WSNs with special focus on energy constraints [3].

A large number of nodes are distributed in the area to gather comprehensive information from the monitoring network. Some of this data has similarity or consistency with each other, which results in a large number of redundant packets [4, 5]. According to the respective data acquisition method, WSNs can be divided into continuous acquisition, event-driven acquisition, and periodic acquisition categories [6]. The event-driven model can be used in fire monitoring, pollution monitoring, medical rescue, and other similar applications. In these applications, when a new event occurs, nodes within the event area upload vital change data which may cause congestion [7]. Once an event of interest is detected, nodes may enter a high working frequency. In monitoring based on an event-driven report, clustering can reduce the energy consumption compared to the unscheduled systems by reducing collisions, idle listening, or overhearing at the cost of coordination messages during the cluster formation period [6].

WSNs mainly expend energy in data communication, transmission, and processing. Data transmission is especially energy-consumptive [8]. Transmission energy consumption is also closely related to transmission quantity. Data fusion is a technique which utilizes in-network processing to remove incorrect and redundant data from sensor node measurements so as to efficiently return information upon the user’s request. Fusing the data effectively helps not only to minimize communication collisions, but also to reduce energy costs as the amount of data transmitted is reduced [9]. In practical application, all fusion methods encounter issues with various uncertainties. However, Dempster–Shafer evidence theory (or D–S evidence theory) provides a natural and powerful method for illumination and synthesis of uncertain information and is commonly used to fuse data [10].

In this work, we concentrated on event-driven data fusion to group nodes within the event area into clusters to reduce the amount of data packets. This paper presents a data fusion algorithm based on event-driven and D–S evidence theory (EDDS). The network forms clusters dependent on a given threshold only when an event occurs, which prevents excessive energy consumption under normal circumstances. After the data is transferred from cluster heads (CHs) to the sink node, weighted data fusion is performed with reasonable confidence and composition rules. This process reduces the influence of anomaly monitored data while increasing the proportion of similar data to help the system determine whether an event has or has not truly occurred.

The remainder of this paper is organized as follows. Section 2 discusses the related work. Section 3 reviews the D–S evidence theory concept, and Sect. 4 explains the proposed scheme and its specifications. Simulation results are presented and discussed in Sect. 5. Section 6 provides a brief summary and conclusion.

2 Related Works

LEACH [11] is one of the clustering algorithms most commonly used to conserve WSN energy. It assumes that all nodes have uniform energy consumption. In actuality, nodes consume uneven amounts of energy and thus, it is unreasonable to select CHs in an equally distributed manner. Some researchers [12,13,14] proposed clustering methods based on event-driven data acquisition. Manjeshwar et al. [12] worked under the assumption that clustering plays the same role as LEACH and that nodes use a given threshold to determine whether data is necessary to forward. Ozger et al. [13] proposed an event-driven spectrum-aware clustering protocol which forms clusters after event detection and maintains them until the end of the event. After an event is detected, CHs are selected among appropriate nodes to form clusters between the event and the sink node; the one-hop member is selected to maximize the number of available two-hop neighbors. After the event, the cluster is no longer available to reduce the energy consumption due to unnecessary clustering and maintenance costs. An energy-aware clustering technique based on event-driven data reporting in WSN called EET was presented by Adulyasas et al. [14] for data monitoring. In EET, when the data changes beyond a given threshold, sensor nodes upload only necessary data. To this effect, clusters are created only in specific locations where such necessary data changes occur. Clusters are operated as long as the ambient situation continues to change. Once the situation becomes stable, the clusters are reset and each sensor node in the clusters switches to sleep mode to conserve the energy otherwise consumed by cluster heads and members. Hou et al. [15] proposed a data fusion algorithm dependent on an event-driven dynamic clustering scheme and neural network, where dynamic cluster and cluster head election processes are based on the severity of the event and the node residual energy. The BP neural network model is used to fuse large amounts of data and extract them.

D–S evidence theory transforms subjective, uncertain, and conflicting information into objective decision-making results [16]. There are two main reasons that D–S evidence theory does not readily satisfy the necessary accuracy for fusion. First, it is difficult to ensure a reasonable and accurate basic belief assignment function (BBAF). Second, it is highly challenging to make decisions with the unified BBAF [17]. Many researchers have attempted to resolve these problems [17,18,19]. Shen et al. [17] proposed an integrated model based on D–S evidence theory and extreme learning machine. Reasonable basic belief assignments are established. Comprehensive basic belief assignments can be obtained via evidence synthesis from several mass functions; the final decision is made based on an extreme learning machine to secure reliable multi-sensor data fusion results. Wang et al. [18] divided sensor node data into several groups depending on the deviation, then applied basic probability assignment to generate a discrimination framework. The mass function of combined evidence is considered the weight distribution function of the evidence. Finally, a unified result can be obtained by a weighted summation rule. Liu et al. [19] introduced a D–S evidence theory-based fault-tolerant event detection algorithm which can be used to analyze the impact of the spatial correlation between nodes at different distances and the node status on event detection performance. The output of each sensor node is characterized as weighted evidences instead of crisp values, where neighboring node status values are reasonably fused according to their individual contribution to the detection.

3 Preliminaries of D–S Evidence Theory

The basic idea of D–S evidence theory is to establish a discernment framework, to determine the degree of support for each set of evidence, and then to apply an evidence synthesis formula to calculate the support for all propositions. Under D–S evidence theory, the set of possible outcomes for a decision problem of all pairs of mutually exclusive pairs defines the discernment framework, Θ = {θ1, θ2,…, θ n }.

3.1 Basic Concepts

Several belief functions in D–S evidence theory are described below.

If a function m: 2Θ → [0, 1],∀ A ⊆ Θ, 0 ≤ m(A) ≤ 1 which satisfies

$$\left\{ {\begin{array}{*{20}l} {\sum\limits_{{A \subseteq\Theta }} {m(A)} = 1} \hfill \\ {m(\Phi ) = 0} \hfill \\ \end{array} } \right.$$
(1)

where Φ denotes the null set and m(A) is the BBAF subset A. The BBAF reflects the degree of evidence support for each subset. Subset A with non-zero mass is called a focal element; (A, m(A)) is a piece of evidence.

Let m be a function of the discernment framework Θ, where the impact of evidences on a given element A of Θ has two points: belief and plausibility. They are denoted as Bel(A) and Pl(A), as shown in the following equation:

$$\begin{aligned} Bel\left( A \right) = & \sum\limits_{X \subseteq A} {m(X)} \\ Pl\left( A \right) \, = & \sum\limits_{X \cap A \ne \phi } {m(X)} = 1 - Bel(\bar{A}) \\ \end{aligned}$$
(2)

where \(\bar{A}\) is the supplement of A, Bel(A) indicates the degree of confidence that the evidence is true for A, and Pl(A) indicates that the trustworthiness of A is not false. For any focal element in Θ, the corresponding BBAF contributes a belief interval [Bel(A), Pl(A)], where the lower and upper probabilities represent the belief and plausibility.

3.2 Combinational Rule

Let m1, m2, …, m n be n independent values in Θ. For a given element A belonging to Θ, the generalized rule for combining n number of evidences is:

$$\begin{aligned} m_{1,2, \ldots ,n} \left( A \right) &= m_{1} (A_{1} ) \oplus m_{2} (A_{2} ) \oplus \cdots \oplus m_{n} (A_{n} ) \\ &= \left\{ {\begin{array}{*{20}l} {\frac{1}{1 - K}\sum\limits_{{A_{1} \cap A_{2} \cap \cdots \cap A_{n} }} {\prod\limits_{i = 1}^{n} {m_{i} (A_{i} )} } } \hfill & {A \ne\Phi } \hfill \\ 0 \hfill & {A = \Phi } \hfill \\ \end{array} } \right. \\ \end{aligned}$$
(3)
$$K = \sum\limits_{{A_{1} \cap A_{2} \cap \cdots \cap A_{n} = \emptyset }} {m_{1} (A_{1} ) \oplus m_{2} (A_{2} ) \oplus \cdots \oplus m_{n} (A_{n} )}$$
(4)

if K = 1, the evidences are completely conflicting.

4 Data Fusion Based on Event Driven and D–S Evidence Theory

4.1 Network Model

The network described in this paper monitors events, then transforms and processes the monitored data of the events. The network system consists of stationary and energy-limited sensor nodes as well as one sink node. All sensor nodes are distributed randomly in the monitored area and have a unique ID. Each sensor node learns relevant information, such as location and ID, for itself and its one-hop neighbors. The clustering process is driven by an event. The node that becomes the CH can adjust the transmission power according to the communication distance.

4.2 Threshold Definition

Definition 1

Stimulation Hard Threshold (SHT): Whenever the sampling value of one node exceeds a threshold SHT, abnormal phenomena can be confirmed.

Definition 2

Biased Threshold (BT): The initial value at which nodes detect the necessary changes in data within the region area.

Definition 3

The cluster head election value (P): The possibility that a node becomes a CH.

$$P = \alpha \cdot (T_{i} - SHT) + (1 - \alpha ) \cdot \frac{{E_{s} }}{{d_{{{\text{to}}S}} }}$$
(5)

where Ti is a sampling value of one node at the current time, E s is the surplus energy of nodes, d toS is the distance from a node to the sink node, and α is the coefficient of event severity.

Definition 4

Cluster Lifetime (CL): Used to judge the existence time of the cluster. Over CL + t, there are no abnormal monitored data and the cluster is disbanded. t is the time at which the node is selected as a CH.

4.3 Node States

There are three node states of EDDS which merit description here. We also explain how the threshold definition provided above functions specifically.

  1. 1.

    Sleep state This state represents the nodes’ situations before they begin to work. The nodes do not communicate with each other during this time so as to conserve energy. The active state is triggered at regular time intervals. The node also returns to this state whenever its energy is depleted or the cluster is disbanded.

  2. 2.

    Active state The main tasks of the node in this state are data gathering and identifying abnormal data with significantly changes (\(|T_{i} - A_{i - 1} | > {\text{BT}}\), where Ai−1 is the last data uploaded to the CH). The nodes now seek a CH as they find abnormal data, and also receive cluster-forming messages from the CH. The node transmits the data to the CH as a member.

  3. 3.

    Excited state Once the node finds an event (T i > SHT), it enters an excited state in which it carries out several tasks such as data collection, data transmission, and high-frequency data processing. The node also may become a CH if its P value is maximal among neighbor nodes in excited states.

4.4 Cluster Formation Based on Event-Driven

When the node receives cluster formation from different CHs, it joins the cluster whose CH has greater surplus energy. To prevent creating an excessive number of clusters, which would lead to redundancy, the nodes do not join a new cluster or become CHs if those labeled as cluster members are stimulated again. The cluster formation process upon abnormal data detection is illustrated in Fig. 1.

Fig. 1
figure 1

Cluster formation process

4.5 Event-Driven Data Fusion Under D–S Evidence Theory

4.5.1 Cluster Members Preprocess

The sensor node compares its own data in the current time T i with the threshold SHT and the data most recently uploaded to the CH Ai−1. If T i > SHT or \(|T_{i} - A_{i - 1} | \ge {\text{BT}}\), T i should be transferred to the CH; otherwise, T i should be disregarded. In general, non-events comprise large portions of the monitoring period (i.e., data changes are small), so that the nodes are sleepy most of the time. This reduces the amount of transmitted data packets and saves energy.

4.5.2 Data Fusion Under D–S Evidence Theory

The data set T = {T1, T2, … T j }, monitored by nodes that belong to the same cluster at time t, is regarded as the discernment framework. Each cluster is considered to be evidence of the discernment framework. The CH first combines the BBAF generated by the nodes within the cluster. The BBAF of each data value in the combination of evidences is considered the weighting coefficient of the fusion. Weighted data fusion results are obtained accordingly.

Under statistical theory, effective monitored values fall within a specific neighborhood of true values. Values outside this neighborhood are affected by environmental noise, human disturbances, or systematic errors. The assigned BBAF of T i belonging to the kth set can be obtained as follows:

$$m_{k} (T_{i} ) = \beta \exp \left( - \ln 2\frac{{j\left( {T_{i} - T_{M}^{k} } \right)^{2} }}{{\sum\nolimits_{i = 1}^{j} {\left( {T_{i} - T_{M}^{k} } \right)^{2} } }}\right)\;\quad 0 < \beta < 1$$
(6)

where β is a trust coefficient which can be altered to adjust the discrimination degree of the obtained BBAF, j is the total quantity of data in a set, and \(T_{M}^{k}\) is the median monitored value in the kth set. m k (T i ) is used to reduce the impact of outliers. The closer T i gets to the median, the higher the BBAF of T i will be.

The BBAF function k can be calculated as follows:

$$m_{k} = m_{k} (T_{1} ) \oplus \cdots \oplus m_{k} (T_{i} )$$
(7)

The fusion result of the kth set of data is:

$$T^{k} = \sum\limits_{i = 1}^{j} { T_{i} m_{k} (T_{i} )} \quad i = 1,2 \ldots ,j$$
(8)

The results of each set are reassigned before the final fusion result is obtained.

For effective event-driven WSN monitoring, it is necessary to consider the distance from the event center: nodes closer to the event center yield more valuable monitored data. Because the event center cannot accurately be determined, here, we use the data set representing the earliest timepoint as a fuzzy event center.

According to our references [20], the distance of evidence \(d_{mass} (m_{1} ,m_{2} )\) can be used to estimate the similarity of the evidence involved in the combination:

$$d_{mass} (m_{1} ,m_{2} ) = \sqrt {\frac{1}{2}(m_{1} - m_{2} )D(m_{1} - m_{2} )}$$
(9)

where m1 and m2 are evidence vectors. D is a positively defined matrix which can be calculated as follows:

$$D = (A,B) = \frac{{\left| {A \cap B} \right|}}{{\left| {A \cup B} \right|}}$$
(10)

In order to transform the reliability of a function to an appropriate metric, we define the credibility of a BBAF for the kth set as:

$$\eta_{k} = 1 - \frac{{d_{mass} (m_{k} ,m_{0} )}}{{\sum\nolimits_{k = 0}^{n - 1} {d_{mass} (m_{k} ,m_{0} )} }}$$
(11)

where \(\eta_{k}\) indicates that the closer the cluster is to the event source, the greater the credibility of its evidence. m0 represents the trust allocation of the first set of data at the time the event is marked. It is necessary to assign a weight for fused result of each data set to further improve the reliability of monitored data and reduce the conflict among evidences.

The average belief weight \(\bar{\eta }_{k}\) is as follows:

$$\bar{\eta }_{k} = \frac{1}{n}\sum\limits_{k = 0}^{{n{ - }1}} {\eta_{k} }$$
(12)

The n − 1 fused results are weighted to yield a final result:

$$T = \sum\limits_{k = 0}^{{n{ - }1}} { \bar{\eta }_{k} T^{k} } \quad k = 0,1 \ldots ,n - 1$$
(13)

Figure 2 summarizes the data fusion process of the proposed EDDS algorithm.

Fig. 2
figure 2

Data fusion process of EDDS

5 Simulation Results and Analysis

This section discusses our performance evaluation for the proposed EDDS algorithm in MATLAB. We set up the average energy consumption, end-to-end delay, and fractional error as performance evaluation indicators for the EDDS algorithm. To analyze node energy consumption, we adopted an energy consumption model similar to LEACH [11]. The simulation parameter values are summarized in Table 1.

Table 1 Simulation parameters and values

5.1 Energy Consumption Analysis

Average energy consumption is one of the most important parameters reflecting the performance of the network. In order to test the performance of our event-driven scheme under different ranges of events and event rates, we assumed that LEACH, EBPDF, and EDDS are adopted to run for 600 epochs.

Figure 3 shows the average energy consumption under various scale event occurrence regions where events occur 30 times per hour. In EDDS, the average energy consumption is 19.6 and 15% lower on average than that of LEACH and EBPDF, respectively, at an event region of 200 × 200 m2. As shown in Fig. 4, the average energy consumption of LEACH is consistently largest and EBPDF second-largest among the three algorithms. At 60 times per hour, the average energy consumption of EDDS is 23.3 and 14.4% lower, respectively, than LEACH and EBPDF. Average energy consumption consistently increases in EDDS and EBPDF but stabilizes in LEACH. LEACH functions are irrelevant to event occurrence, and the LEACH network is continuously exciting over the network’s whole lifetime, so its average energy consumption is higher than the other algorithms. Clustering in EDDS and EBPDF are connected to event occurrence, so costs increase as more events occur. The nodes participating in cluster formation are limited to the region where events occur and to the event rate. However, the nodes remain asleep during calm monitoring periods, which enables significant energy conservation compared to LEACH.

Fig. 3
figure 3

Average energy consumption in various event occurrence regions

Fig. 4
figure 4

Average energy consumption under different event rates

The proposed algorithm is based on an association between D–S evidence theory and an event-driven clustering scheme. The network generates large amounts of data when an event occurs, but the amount of data is reduced after data processing through D–S evidence theory. EDDS requires somewhat costly fusion of the data in clusters, but the amount of communication data in the node is fairly low; in short, it works under a reasonable energy constraint.

The average energy consumption of nodes under the same three algorithms with events occurring 60 times per hour in a 200 × 200 m2 region is shown in Fig. 5. EDDS showed the lowest energy consumption throughout the simulation period. The energy depletion of LEACH and EBPDF are at 620th and 843rd epoch, respectively. EDDS maintains surplus energy until epoch 1000.

Fig. 5
figure 5

Average energy consumption per epoch

EDDS markedly differs from the other two algorithms mainly in regards to its minimal amount of transmitted data. EDDS uses given thresholds to sift the sampling data while LEACH does not exploit the threshold mechanism. If there is no abnormal data that exceeds the thresholds mentioned above, the data is not uploaded. Nodes in non-event regions stay asleep, thus minimizing the overall network energy consumption. EBPDF, conversely, continually transfers data after cluster formation without consideration of any given threshold.

5.2 End-to-End Delay Analysis

End-to-End latency defined as the time from when a packet is received by the CH to when it is delivered to the sink node. Figure 6 shows different event rate versus average end-to-end delay, where the network can generate data packets at a rate of 24 kbps. The latency of EDDS increases slowly when the event rate is lower than 20 times per hour, but after that the latency increases quickly. When event rate is less than 20 times per hour, the nodes sleep for a regular time and packets are delivered in a specified unit time. As the increasing of event rate,the nodes are switched between a sleep state and active/excited state frequently, and packets are delivered continuously in specified unit time which is shorter than no events. The packets increased with increasing of event rate, lead to network congestion and incur long delays [21], which is why the latency of EDDS is consistently growing. At the same time, CHs need more time to fuse received packets.

Fig. 6
figure 6

End-to-end delay

5.3 Fractional Error Analysis

In this experiment, we used a data set for fire temperature distribution (Fire Dynamic Simulator [22]) to evaluate EDDS. The sampling data of the node closest to the ignition source serves as the blank data and β = 0.27. Figure 7 shows the fusion results of EDDS and EBPDF, where the results of EDDS and the blank data show a similar trend as the epochs progress.

Fig. 7
figure 7

Comparison of fusion results

To measure fused data accuracy, we computed the fractional error in the achieved results as follows:

$$Fractional \, error = \frac{{\left| {Fusion{\kern 1pt} \;result - Blank\;data} \right|}}{Blank\;data} \times 100\%$$
(14)

We randomly selected five sets of fusion results to calculate the fractional error as shown in Table 2. EDDS outperformed EBPDF in processing data and showed a consistently lower fractional error, indicating that it is indeed feasible and effective.

Table 2 Fractional error of EDDS and EBPDF algorithms

6 Conclusions

This paper presented a data fusion algorithm based on event-driven and D–S evidence theory which yields high-precision fusion results with low control energy overhead. Nodes share similar data for most of the monitoring time. Thresholds are applied to nodes to ensure they transmit only necessary data and clustering based on event-driven, so the nodes transmit very little data when the network situation is stable to reserve energy. Clusters are created only in areas with abnormal data which persists over a certain time period. Multiple data sources are combined into coordinated results through D–S evidence theory. Data from the same cluster serves as evidence of the framework, and data closer to the event source is given greater credibility. Compared to two other classical algorithms, EDDS is more energy efficient and yields more accurate data fusion for the detection of abnormal WSN data. In future, the research directions include the utilization of multi-source data or multi-path fusion and the optimization of other EDDS parameters such as quality of fusion and latency.