1 Introduction

Wireless sensor networks (WSNs) is a multi-hop self-organizing network formed by a large number of wireless sensor nodes through wireless communication. A large number of sensor nodes are randomly deployed in different environments (for example, some bad environments that human can’t stay for long time), which be used to sense and collect target information so that people can analyze and process it to make reasonable judgments. Nowadays, wireless sensor networks are applied in many areas, such as environmental monitoring, military field, intelligent transportation, logistics tracking, intelligent healthcare [1,2,3] and so on.

However, the sensor node has many disadvantages in the real world at the same time. For example, the sensor node energy is limited and can not add energy, when collecting information, data calculation and sending data, the sensor node will consume energy. The energy consumption of each node will affect the lifetime of entire network. Paper [4, 5] shows that the amount of energy consumed by one Berkeley node executing 800 instructions is almost similar to the energy consumed to transmit 1bit data, therefore, the data traffic between nodes can not be too large in order to extend the life cycle of entire network; The nodes are deployed in an unattended environment and are easily captured by an adversary. The attacker can either steal confidential data directly or tamper the confidential data through faking the sensor node, and the collected data is easy to be detected and eavesdropped when the data are transmitting by wireless. Therefore, it is very important to invent an efficient privacy protection algorithm.

Nowadays, Wireless sensor nodes usually are deployed in military, medical and smart grid and other related applications to collect data, in such a highly confidential application, data leakage and tampering will result in an unpredictable loss. If we can’t protect the data confidentiality, this will expose attackers with highly-sensitive privacy information such as operational environment information in battlefield surveillance applications and patient medical records in healthcare applications. The attacker will know lots of information, which would threaten our security and life.

Although many researchers have proposed a series of aggregation algorithms that can provide data information security algorithms. That can protect the data confidentiality. But there are still some problems with the data confidentiality. After looking through many different data security aggregation algorithms. We presents a D-SMART data aggregation algorithm based on the SMART [3]. The improved D-SMART algorithm extends the network life cycle and enhances the privacy protection.

The rest of the paper is organized as follow: Section 2 presents some typical wireless sensor network security data aggregation algorithms. Section 3 introduces the SMART security data aggregation algorithm in detail. Section 4 states the proposed D-SMART algorithm in detail. Section 5 describes the simulation results, and the paper concluding remarks in Sect. 6.

2 Wireless Sensor Network Security Data Aggregation

Data aggregation technology is one of the key technologies of wireless sensor networks. It can remove redundant information in the network and reduce the amount of data transmission, thus effectively improve the energy and bandwidth efficiency of the whole network [6,7,8]. However, the computing power, storage capacity and energy supply of the wireless sensor nodes are very limited, and all the nodes form the network through wireless self-organization. Therefore, the wireless sensor network data aggregation process is vulnerable to various types of attacks. Through the research and analysis of the development of wireless sensor network security data protection technologies, we can see the potential of the security data protection technology in the future [9,10,11].

Some typical wireless sensor network security data aggregation algorithms were proposed such as the SMART algorithm, the CPDA algorithm, the ESPDA algorithm and so on.

  1. (a)

    The SMART algorithm is a secure data aggregation algorithm based on data slicing technology. The basic idea of the SMART algorithm is to slice the original data information collected by each sensor node in the monitoring target into a fixed number of data slices, and then send the data slices randomly to their own neighbor node, after all nodes have received the data slices and mixed them, the new mixed data will be sent to the aggregation node for data aggregation operation. In this way, when the network is attacked, the node data information is intercepted, the attacker only get several incomplete data slices and not complete the data information, effectively protect the privacy of monitoring data, but it would consume more energy.

  2. (b)

    In the Cluster-based Privacy Data Aggregation (CPDA), the sensor nodes hide the real data values by adding random seeds and private random values to the original data. The cluster head node solves the exact summation aggregation result by algebraic properties of polynomials. However, it is performed by way of polynomial algebra, which not only requires a large amount of computation, but also needs to send information with each neighbouring nodes during aggregation process, resulting in huge energy consumption.

  3. (c)

    Cam and Ozdemir proposed a pattern-based clustered wireless sensor network security data aggregation protocol Energy-efficient and Secure Pattern-based Data aggregation (ESPDA), the idea is as follows: the nodes calculate data pattern code after collecting the original data (the original data are classified and identified by the pattern code, the original data with the same pattern code are similar and can be regarded as redundant data), and then nodes send data according to the pattern codes. If different sensor node sets have the same pattern Code, only need to send one of the encrypted original data to the cluster head node. So the whole original data aggregation process becomes the aggregation of pattern codes, the original data information can be hidden better. This approach achieves more efficient aggregation, but has the disadvantage of being resistant to external attacks and not defending against attacks from internal nodes [12,13,14,15].

3 The SMART Security Data Aggregation

Security data aggregation is the core of wireless sensor networks privacy and security [16,17,18,19,20]. The Slice-Mix-AggRegaTe (SMART) security data aggregation algorithm was proposed by He et al. [3]. This algorithm introduces data slice and mix technology into wireless sensor network security data aggregation issues in the study, provide better privacy protection for the data aggregation process. The security performance and practicality of the SMART algorithm are validated in the simulation experiment [21]. This data privacy protection mechanism is implemented by three phases: the Slice phase, the Mix stage, and the Aggregation phase. The workflow of the protection mechanism is listed as follows:

  1. (a)

    In the Slice phase, the sensor node slices the collected original data into J (J ≥ 3) data slices, and J is a fixed value. In addition to retaining one of the original slices, the node randomly distributes the remaining J − 1 data slices to other neighbour nodes.

  2. (b)

    In the Mix phase, the nodes mix the received data slices with its own remaining original data slices into a new packet.

  3. (c)

    In the Aggregation phase, the nodes transmit the mixed data packets to the aggregation nodes. The aggregation nodes will aggregate the collected data and the received data from other nodes, then continue to transmit to the upper node until all the aggregation data arrives at the base station, and finally integrate and decrypt all the data slices at the base station to obtain the real data.

Although the SMART algorithm has a good data privacy protection performance, but there are still some other deficiencies that affect its practicality. First of all, sending a number of data slices will result in largely increasing the data traffic between nodes, which reduces the working life of nodes in the data slice phase. According to the privacy probability formula in the SMART algorithm, it can be seen that the privacy protection performance of the algorithm is proportional to the number of data slices J, which cause the network system to make compromise on network working life in order to obtain better network security performance. Secondly, with the increase of data transmission, the probability of occurrence of data transmission collision, delay, error will rise, and ultimately affect the accuracy of data aggregation results and data aggregation efficiency [22,23,24]. The proposed security data aggregation D-SMART algorithm is based on the optimization and improvement of SMART algorithm is a more secure and efficient data aggregation algorithm.

4 Improvements to the SMART Algorithm

In the SMART algorithm, the sensor node does not consider whether the perceived information is important or not, slice all the collected data into three slices, but sometimes some unimportant data also be sliced into 3 slices is unnecessary, just consuming excess energy; it is too less to be sliced into 3 slices for some very important data, reducing the data privacy protection performance so that easily reveal the important data.

4.1 The Basic Idea of the D-SMART Algorithm

In this case, we propose the D-SMART algorithm which base on dynamical slicing technology. The algorithm classify the perceived data into three different degree according to their importance, then dynamically slice data and then send the slices according to the importance degree of the perceived data. Unimportant data will be less sliced, important data will be more sliced, the ordinary degree data (2 slices), the important degree data (3 slices), the confidential degree data (4 slices). The algorithm improves the shortcomings of the SMART algorithm in terms of data transmission and data aggregation accuracy, optimizes the construction style of tree aggregation network. It enhances the data privacy protection, reduce the communication cost and computing load of sensor nodes and prolong the network life.

4.2 The Standards of Dynamic Slicing

In this paper, the degree of deviation between the perceived variables by the sensor nodes and the magnitude of their mean values is used to determine the importance degree of the perceived data. Assuming that the sensor node senses × temperature values during T minutes, you can set:

$$\chi = \frac{{x_{1} + x_{2} + \cdots + x_{n} }}{x},\quad \sigma^{2} = \frac{{\sum {(x_{i} - \chi )^{2} } }}{x - 1}$$
(4.1)

we use the degree of \(\sigma^{2}\) decide the data slice number as follow:

$$\left\{ \begin{array}{l} Ordinary\;Data{-}2\;slice,\quad \sigma^{2} < T_{1} \hfill \\ Important\;Data{-}3\;slices,\quad T_{1} \le \sigma^{2} \le T_{2} \hfill \\ {\text{Confidential Data}}{-}4\;slices,\quad \sigma^{2} > T_{2} \hfill \\ \end{array} \right.$$
(4.2)

The T1 and T2 are given threshold.

4.3 The Implementation Steps of the D-SMART Algorithm

In this paper, the wireless sensor network is abstracted as the connected graph G (V, E), where V denotes the sensor node set, |V| = N denotes the number of sensor nodes, and E denotes the communication link of the sensor nodes [25].

In this chapter, all the nodes are classified into three types: Base Station (BS), Aggregator Node (AN) and Leaf Node (LN), where BS is located at the top of the tree-type aggregation network, the aggregator node and the leaf node are distributed at the bottom of the network, The leaf node is only responsible for collecting the data and passing it to the aggregator nodes. The aggregator nodes aggregate these data slices into a new packet and pass the new packet to the BS node. The BS node obtains the final data aggregation result. So we define the data aggregation function as follow [26, 27]:

$${\text{y}}({\text{t}}) = {\text{f}}\left( {{\text{d}}_{1} ({\text{t}}) + {\text{d}}_{2} ({\text{t}}) + \cdots + {\text{d}}_{\text{i}} ({\text{t}})} \right)\quad ({\text{i}} = 1,2, \ldots ,{\text{N}})$$
(4.3)

In the function, di(t) represents the data collected by node i at time t (as is shown in Fig. 1). There are many typical data aggregation functions, such as count, average, max, min, etc. can be simplified to sum function, so we use the sum function:

$$y(t)\sum\limits_{i = 1}^{N} {d_{i} (t)} ,$$
(4.4)

as the research object in this article.

Fig. 1
figure 1

Data aggregation sum function diagram

  1. 1.

    The phase of building the data aggregation tree

    1. (a)

      The BS node broadcasts the “hello” message to the nodes at network lower level to recruit the child nodes. After receiving the “hello” message, any nodes can send the “join_request” message to the sender to apply as its child node. If the node already has a parent node, it will not reply. If a node receives many “hello” messages at the same time, it will randomly select a sender to reply the “join_request” message to become the child node of the sender node. Then the sender acknowledges its node relationship after receiving the “join_accept message”.

    2. (b)

      The system given value Pa determines the probability that any node becomes the aggregator node in the network. Assuming that the total number of nodes is N in the network, so the number of aggregator nodes is N * Pa, the number of other N*(1 − Pa) node are leaf nodes. The aggregator nodes continue to broadcast the “hello” message to the lower neighbor node to recruit the child nodes. When the recruiting behaviors are over in the network, the construction of the data aggregation tree is complete [28,29,30].

  2. 2.

    The data slicing phase

The sensor node dynamically slice the collected original data depending on the degree of \(\sigma^{2}\), encrypt these data slices through the shared keys, then send these data slices to their neighbor nodes. We can see from Fig. 2.

Fig. 2
figure 2

Data slicing

  1. 3.

    The data mixing phase

The sensor node decrypts the received data slices and mixes them with its previous retaining raw slice to generate a new packet. Ab is the new packet after the mixed calculation of node b, Ub is the nodes set which send slice to node b,rab represents the number of slice. The data slices mixed operation function is defined as:

$$A_{b} = \sum\limits_{{a \in U_{b} }} {r_{ab} } .$$
(4.5)

We can see from Fig. 3

Fig. 3
figure 3

Data mixing

  1. 4.

    The data aggregation phase

After the aggregator nodes accept all data slices sent by their child nodes. The aggregator nodes decrypt the slices and then perform the aggregation operation, after encrypting the aggregation result and pass it to the base station. When all the mixed data arrives at the base station, the base station decrypts these mixed data with the shared key and obtains the final data aggregation result. We can see from Fig. 4.

Fig. 4
figure 4

Data aggregation

We can see the whole process of D-SMART algorithm process from Fig. 5.

Fig. 5
figure 5

D-SMART algorithm flow chart

We can see that after the construction of TAG tree, the sensor nodes start collect environment information in the preparing phase; Then, the sensor nodes evaluate the importance of information firstly, slice the data packet depending on the importance degree, mixing the slices into a new packet and aggregate the new packet.

5 Simulation Results and Analysis

In this paper, we will use the embedded Simulator (SOSSIM) in TinyOS as a simulation tool to simulate the network security performance of D-SMART algorithm, TAG algorithm and SMART algorithm respectively. The network environment is deployed as follows: 400 m * 400 m rectangular area, randomly distributing 600 sensor nodes at the region, the aggregator node ratio is set to Pa = 0.1, the background noise is – 105 dBm, Gaussian white noise is 4 dB. Simulation experiments include: communication overhead, the data privacy protection performance and data aggregation accuracy.

5.1 The Performance of Privacy Protection

We define P(q) as the probability of private data being decrypted and use it as a measure of privacy protection, where q is the probability that the link between nodes is decrypted.

In the SMART algorithm, the attacker must break all the J − 1 out-degree links and all the in-degree links, then he is able to completely break its privacy. Correspondingly, P(q) can be defined as:

$${\text{P}}({\text{q}}) = {\text{q}}^{{{\text{J}} - 1}} \sum\limits_{{{\text{k}} = 0}}^{{{\text{d}}_{{{\text{in}}\_\hbox{max} }} }} {P({\text{in-degree}} = {\text{k}}){\text{q}}{}^{\text{k}}}$$
(5.1)

din_max represents the maximum in-degree value of nodes in the network in the formula. The value of din_max in the SMART algorithm is determined by J; p(in-degree = k) represents the probability of the node in-degree value k; \(\sum\nolimits_{{{\text{k}} = 0}}^{{{\text{d}}_{{{\text{in}}\_\hbox{max} }} }} {{\text{P}}({\text{in-degree}} = {\text{k}})} {\text{q}}{}^{\text{k}}\) represents the probability of all the out-degree links between the nodes are eavesdropped; qJ−1 represents the probability that all in-degree links between the nodes are eavesdropped. In the D-SMART algorithm, J represents the maximum number of slices generated by the leaf node; j represents the actual number of slices generated by the leaf node. It can be seen that the formula of the probability of the node privacy exposure P(q, j) in the D-SMART algorithm as follow:

$$\begin{aligned} {\text{P}}_{\text{D}} ({\text{q}},{\text{J}}) & = \sum\limits_{{{\text{j}} = 1}}^{\text{J}} {{\text{P}}({\text{out-degree}} = {\text{j}}){\text{q}}^{{{\text{j}} - 1}} } \\ & \quad *\sum\limits_{k = 0}^{{{\text{d}}_{{{\text{in}}\_\hbox{max} }} }} {{\text{P}}({\text{in-degree}} = {\text{k}}){\text{q}}^{\text{k}} } \\ \end{aligned}$$
(5.2)

In the above formula, P(out-degree = j) represents the probability that the number of out-degree links of the sensor node is equal to j; \(\sum\nolimits_{{{\text{k}} = 0}}^{{{\text{d}}_{{{\text{in}}\_\hbox{max} }} }} {{\text{P}}({\text{out-degree}} = {\text{k}})} {\text{q}}{}^{\text{k}}\) represents the probability that all out-degree links in the node are eavesdropped.

From Fig. 6, with the increase of the probability of communication link being cracked, the privacy data exposure probability of the two algorithms is increasing rapidly. It can be seen that data exposure probability of D-SMART algorithm is higher than SMART algorithm until the probability of communication link being cracked is up to 0.03%. We can see that when the probability of communication link being cracked is up to 0.02%, the privacy data exposure probability is up to 0.082 in D-SMART algorithm, the privacy data exposure probability is up to 0.073% in SMART algorithm. After the point of 0.03%, data exposure probability of SMART algorithm is higher than D-SMART algorithm. We can see that when the probability of communication link being cracked is up to 0.07%, the privacy data exposure probability is up to 0.261 in D-SMART algorithm, the privacy data exposure probability is up to 0.273% in SMART algorithm. It is obvious that the probability of privacy exposure of node data in D-SMART algorithm is obviously lower than the SMART algorithm after the point of 0.03%.

Fig. 6
figure 6

Data privacy protection of SMART and D-SMART

5.2 The Data Communication Overhead

The data transmission between sensor nodes will consume a lot of energy, it is one of the key factors to prolong the network life cycle by reducing the energy consumption between nodes as much as possible. This chapter compares the data transmission overhead of TAG, SMART, and D-SMART algorithms by the simulation.

In the TAG algorithm, the sensor node directly forward its collected data to upper nodes in the network, and its data traffic is N. In the SMART algorithm, the data traffic is N * J, due to node data slicing and sending. In the D-SMART algorithm, the aggregator node does not slice and send the packet, only transmitting packet in the data aggregation phase, so the data transmission traffic is N * Pa. The number of leaf nodes are N * (1 − Pa), and the number of out-degree links for each leaf node is Ji, Ji is the number of data slices of node i. So the data communication overhead of the D-SMART algorithm is:

$${\text{N}}*{\text{Pa}} + \sum\limits_{{{\text{i}} = 1}}^{{{\text{N}}(1 - {\text{Pa}})}} {{\text{j}}_{\text{i}} \quad ({\text{j}}_{\text{i}} \in 2,3,4)}$$
(5.3)

\(\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{N}}(1 - {\text{Pa}})}} {{\text{j}}_{\text{i}} \;({\text{j}}_{\text{i}} \in 2,3,4)}\) represent the number of slices send by leaf nodes.

The above Fig. 7 shows the network communication overhead of the three security data aggregation algorithms in the simulation environment. The network communication overhead of TAG algorithm is the lowest, keeping at 600 slices, the SMART algorithm is the highest, keeping at 1800 slices, the node slicing into [2,3,4] slices change in the D-SMART algorithm, the simulation results show that the D-SMART algorithm communication overhead is fluctuating between 1200 and 1400 slices, so the network communication overhead between the TAG and SMART algorithm.

Fig. 7
figure 7

Communication overhead of TAG, SMART and D-SMART

5.3 The Computation Overhead Analysis

One of the limitations of the wireless sensor network is the energy limitation of the sensor nodes. Although the data communication is the main energy consumption mode in the network, however, the energy consumption caused by the calculation load in the network can not be neglected, The computation overhead is also one of the factors that must be considered in the data aggregation security algorithm based on privacy protection in the network. In this section, we will discuss the calculation of the computational load and the effect on the energy consumption of the two algorithm in the network. Table 1 defines several different types of computation overhead.

Table 1 the sign of computation load

Then, for the SMART algorithm, N sensor nodes perform N data slice calculation at the stage of data slicing, and at the same time these nodes need to encrypt and decrypt N * (J − 1) slice data; In the process of data mixing, the sensor nodes need to perform N * (J − 1) arithmetic computation. Then in the process of data aggregation, the nodes perform N arithmetic, encryption and decryption calculation. The computation load of the SMART algorithm—Csmart can be expressed as the follow equation

$${\text{C}}_{\text{SMART}} = {\text{N}} * {\text{C}}_{\text{div}} + {\text{N}} * {\text{J}} * ({\text{C}}_{\text{enc}} + {\text{C}}_{\text{dec}} + {\text{C}}_{\text{cal}} )$$
(5.4)

In the process of data slicing of D-SMART algorithm, only N * (1 − Pa) nodes perform SD-SMART data slice calculation at the stage of data slicing, then these nodes encrypt and encrypt SD-SMART slice data, In the process of data mixing, the sensor nodes need to perform SD-SMART arithmetic computation. Then in the process of data aggregation, the nodes perform N arithmetic, encryption and decryption calculation, thereinto, Pa represents the proportion of aggregation nodes and a2, a3, a4 respectively represents the ratio of different level nodes, so Pa + a2 + a3 + a4 = 1.

$$\begin{aligned} {\text{C}}_{\text{D-SMART}} & = {\text{N}} * (1 - {\text{Pa}}) * {\text{C}}_{\text{div}} \\ & \quad + ({\text{N}} + {\text{S}}_{\text{D-SMART}} ) * ({\text{C}}_{\text{enc}} + {\text{C}}_{\text{dec}} + {\text{C}}_{\text{cal}} ) \\ {\text{S}}_{\text{D-SMART}} & = \sum\limits_{{{\text{i}} = 1}}^{{{\text{N}}(1 - {\text{Pa}})}} {{\text{j}}_{\text{i}} \quad ({\text{j}}_{\text{i}} \in 2,3,4)} \\ \end{aligned}$$
(5.5)

Equations 5.4 and 5.5 are the comparison between SMART algorithm and D-SMART algorithm. It can be seen from Eqs. 5.4 and 5.5 that CSMART—CD-SMART have a close correlation between Pa and the number of slices j for each original data in the two algorithms, with the increase of j, the difference of CSMART—CD-SMART will become bigger and bigger. In other word, the advantage of D-SMART algorithm computation load lowering SMART algorithm computation load will become more and more obvious.

$$\begin{aligned} & {\text{C}}_{\text{SMART}} - {\text{C}}_{\text{D-SMART}} = {\text{N}} * {\text{Pa}} * {\text{C}}_{\text{div}} \\ & \quad + [{\text{N}} * ({\text{j}} - 1) - {\text{S}}_{\text{D-SMART}} ] * ({\text{C}}_{\text{enc}} + {\text{C}}_{\text{dec}} + {\text{C}}_{\text{cal}} )\quad ({\text{j}} \ge 3) \\ \end{aligned}$$
(5.6)

5.4 The Data Aggregation Accuracy

Data aggregation accuracy is one of the important indexes that reflects the accuracy performance of the security data aggregation algorithm. It is defined as the ratio of the actual aggregation result to the theoretical aggregation result:

$${\text{Pa}} = \frac{D*}{{\sum\nolimits_{i = 1}^{N} {Di} }},$$
(5.7)

D* represents the final aggregation result of the BS node; \(\sum\nolimits_{{{\text{i}} = 1}}^{\text{N}} {{\text{D}}_{\text{i}} }\) represents the theoretical aggregation. However, the process of data aggregation can not avoid the data collision, delay, retransmission and error of data transmission and so on. Different bit error rate (BER) will influence the aggregation result.

As can be seen from the two above graphs. From Fig. 8, when the BER is 0.03, with the increase of data aggregation time interval, the data aggregation accuracy of the three algorithms is increasing rapidly, in which the TAG algorithm is the fastest and the data aggregation accuracy up to 90% at 10 s; While the SMART algorithm increases at the slowest pace, the data aggregation accuracy up to approximately 54% at 10 s, the D-SMART algorithm growing between the TAG algorithm and the SMART algorithm. The three aggregation accuracy almost does not changes after 10 s, the data aggregation accuracy of SMART algorithm is the lowest up to 68.7%, the TAG algorithm data aggregation accuracy is highest up to 90.2%, the D-SMART algorithm data aggregation accuracy is still between the two algorithms up to 83.4%. From Fig. 9,when the BER is 0.04, with the increase of data aggregation time interval, the data aggregation accuracy of the three algorithms is increasing rapidly, in which the TAG algorithm is the fastest and the data aggregation accuracy up to 88% at 10 s; While the SMART algorithm increases at the slowest pace, the data aggregation accuracy up to approximately 52.1% at 10 s, the D-SMART algorithm growing between the TAG algorithm and the SMART algorithm. The three aggregation accuracy almost does not changes after 10 s, we can know that different BER cause different results, the BER is also a key factor to the aggregation accuracy.

Fig. 8
figure 8

Data aggregation accuracy of TAG, SMART and D-SMART (BER = 0.03)

Fig. 9
figure 9

Data aggregation accuracy of TAG, SMART and D-SMART (BER = 0.04)

6 Conclusion

Based on the analysis and study of other wireless sensor network security protection algorithms in terms of communication overhead, aggregation accuracy and the performance of privacy protection. We propose the D-SMART algorithm which improves the disadvantage of data aggregation accuracy and privacy protection of the SMART algorithm. Meanwhile reducing the data traffic, saving the energy consumption and prolonging the network lifetime.

Data integrity is also a challenge for data security protection. In the future, we will extend the data integrity on the basis of the D-SMART algorithm, we will protect the data integrity while protecting data privacy.