1 Introduction

The use of wireless sensor network has grown enormously in last decade. Wireless sensor networks are used in variety of applications such as monitoring physical and environmental conditions, military surveillance, live stock tracking, home appliance monitoring etc. In most wireless sensor network (WSN) applications nowadays the entire network must have the ability to operate unattended in harsh environments in which pure human access and monitoring cannot be easily scheduled or efficiently managed or it’s even not feasible at all [1, 2]. There is a crucial need for scalable and energy efficient routing and data gathering and aggregation protocols in corresponding large-scale environments. Sensor nodes are generally deployed randomly uncontrolled means in many significant wireless sensor network and they form a network in an Ad hoc manner [3, 4]. Sensors in such networks are battery powered and energy constrained and their batteries usually cannot be recharged. Therefore we need energy aware routing and data gathering protocols that offer high scalability and low energy consumption for a long network lifetime. In wireless sensor network hundreds to thousands of sensor nodes are deployed usually randomly. Sensors sense their environment and send their sensed data to a processing centre, called as “Sink” or “Base Station” where all the data is collected and processed. Many routing algorithms have been proposed for efficient transmission of data between base station and sensor nodes. Grouping of sensor nodes into clusters has been widely used by researchers to satisfy the scalability, high energy efficiency and prolong network lifetime objectives [5,6,7,8]. In clustering the whole sensor network is partitioned into multiple groups of sensor nodes. Each group is called a cluster and each cluster has a leader called cluster head that perform special tasks such as data aggregation and fusion. Clustering has number of other benefits and corresponding objectives, In addition to supporting network scalability and decreasing energy consumption through data aggregation. The route setup can be localize within the cluster and thus reduce the size of the routing table stored at the individual sensor node. Clustering also conserve communication bandwidth by limiting the scope of inter-cluster communication among Cluster heads (CHs) and reduces redundant exchange of information among sensor nodes. Clustered architecture is use full for sensor networks because of its inherent suitability for data fusion the data gathered by all members of cluster can be fused at cluster head. A Cluster head can schedule activities in a cluster so that node can switch to low power sleep mode and reduce the energy consumption [1, 3, 8].

Paper organization: The rest of the paper is organized in following manner. Section 2 describes some representative protocols in the category of probabilistic clustering. Section 3 describes in detail our proposed modifications in existing clustering algorithms. Section 4 provides performance evaluation and simulation results. Section 5 concludes the whole work with a direction to future work.

2 Related Work

There is a widely used class of clustering algorithms called probabilistic clustering algorithms. The main aim of these algorithms was to prolong the network life time and reduce the energy consumption. There are number of clustering algorithms have been specifically designed for WSNs [9,10,11,12,13,14]. These proposed clustering techniques widely vary depending on the node deployment and bootstrapping schemes, the pursued network architecture, the characteristics of the cluster head (CH) nodes and the network operation model. Low energy adaptive clustering hierarchy protocol improves using fuzzy logic, which takes battery level, distance, and node density into consideration [15, 16]. In Energy Efficient Strong Head clustering (EESH) [17], nodes are promoted CHs according to their respective residual energies, their respective degrees and the distance to and the residual energy of their neighbors. A CH may be elected by the sensors in a cluster or pre-assigned by the network designer. A CH may also be just one of the sensors or a node that is richer in resources. Most significant and widely used representatives of this category are LEACH [18], HEED [19], SEP [20], EEHC [21], TEEN [22].

2.1 Low Energy Adaptive Clustering Hierarchy (LEACH)

LEACH is one of the most popular clustering protocols proposed for WSNs. LEACH uses rotation of cluster heads to balance network energy consumption. The operation of the LEACH is divided into a number of rounds. Each round includes a set-up phase and a steady phase. Clusters are organised in set-up phase while sensed data are transferred from sensors to cluster heads in steady phase. In LEACH, cluster heads are elected in a distributed manner and independent from each other. Nodes make their decision to become a cluster heads by choosing a random number r in interval [0, 1]. A sensor node decides to be a CH in a round if its generated random number is less than the below a threshold [18]:

$$T\left( n \right) = \left\{ {\begin{array}{*{20}l} {\frac{p}{{1 - p\left( {r mod \frac{1}{p}} \right)}}} \hfill & { if \quad n \; \in \;G} \hfill \\ { 0 } \hfill & { otherwise} \hfill \\ \end{array} } \right.$$
(1)

where p is the desired percentage of CH nodes in the sensor population, r is the current round number; G is the set of nodes that have not been CHs in the last 1/p rounds.

2.2 Hybrid Energy Efficient Distributed Clustering (HEED)

HEED is another improved and very popular energy efficient protocol. It allows multi hop communication among cluster heads (CHs) and base station (BS). HEED is a hierarchical, distributed, clustering scheme in which a one-hop communication mechanism retained within each cluster [19]. There are two basic parameters, residual energy and intra cluster communication cost to choose CH nodes. The initial set of CHs obtained by using residual energy of nodes. On the other hand, intra-cluster communication cost reflects the node degree or node’s proximity to the neighbour and is used by the nodes in deciding to join a cluster or not. Thus, unlike LEACH, it does not select Cluster head nodes randomly. Only sensors that have a high residual energy can become CH nodes.

2.3 Stable Election Protocol (SEP)

SEP is a heterogeneous proactive protocol proposed in [20] with two levels of heterogeneity. It considers two types of sensor nodes, normal nodes and advance nodes. Advance nodes have more energy as compared to normal nodes. In SEP both types of nodes have weighted probabilities of becoming cluster heads. Advance nodes have more chances of becoming cluster head than normal nodes. Weighted election probabilities of the normal and advance nodes, P nrm and P adv can be calculated as:

$$P_{nrm} = \frac{p}{1 + \alpha m}$$
(2)
$$P_{adv} = \frac{{p\left( {1 + \alpha } \right)}}{1 + \alpha m}$$
(3)

2.4 Energy Efficient Hierarchical Clustering (EEHC)

EEHC is another probabilistic clustering protocol proposed in [21]. LEACH protocol allowed only one level of clustering but EEHC allow more than one level of clustering. This protocol works in a bottom up fashion. It is a distributed algorithm for organization of sensor nodes k-hop hierarchical cluster. Initially, each sensor node is elected as a CH with probability ‘p’ and announces its election to the neighbouring nodes within its communication range. The above CHs are now called the ‘volunteer’ CHs. Next, all the nodes that are within ‘k’-hops distance from a ‘volunteer’ CH, are supposed to receive the election message either directly or through intermediate forwarding. Consequently, any node that receives such Cluster head election message and is not itself a Cluster head (CH), becomes a member of the closest cluster. Additionally, a number of forced Cluster heads are elected from nodes that are neither CHs nor belong to a cluster. Specifically, if the election messages do not reach a node within a preset time interval t, the node becomes a ‘forced’ CH assuming that it is not within ‘k’ hops of all volunteer CHs.

2.5 Threshold Sensitive Energy Efficient Network Protocol (TEEN)

This protocol focuses on information aggregation rather than on cluster formation. It provides a hierarchical clustered structure, grouping nearby nodes within the same cluster. TEEN [22] use data centric method with hierarchical approach. The protocol defines two thresholds: the hard threshold is a threshold (absolute) value for the sensed attribute, while the soft threshold is a threshold (small change) value of the sensed attribute. The soft threshold can be varied, depending of the criticality of the sensed attribute and the target application. The nodes transmit sensor readings only when they fall above the hard threshold and change by given amount (soft threshold). In TEEN, medium continuously sensed by sensor nodes, but data transmission is done less frequently which favours the energy saving. However, if the thresholds are not crossed, the nodes will never communicate. TEEN does not support periodic reports.

3 Proposed Technique

In this section the detailed reasoning of proposed Energy Efficient Probabilistic Clustering Technique (EEPCT) for Data Aggregation in Wireless Sensor Network is discussed. We have used the same radio energy model for power consumption in radio transmission as used in most of the probabilistic Clustering algorithms such as LEACH [18].

3.1 Requirement of Clusters of Almost Same Size

Probabilistic clustering techniques usually do not take into account the relative positions of selected cluster heads; as a result there is a fair chance that in several rounds a considerable number of the CHs will be either in proximity or very far from each other. In case, when a considerable number of cluster heads are in close proximity, the number of nodes in the clusters associated with these cluster heads will be considerably low in comparison to other clusters which will give rise to uneven clustering. While a well balanced clustering i.e. nearly same sized clusters is crucial for reasonable performance of probabilistic clustering techniques such as LEACH, TEEN etc. On the contrary if selected cluster heads are very far from each other, there will be a significant wastage of energy in inter-cluster communication in protocols such as HEED, EEHC etc.

Therefore a uniform distribution of cluster heads over entire network region is desirable to achieve a well balanced clustering.

So as our first proposed improvement, we want to introduce a parameter ξ to denote the closeness and its value depends on the size, node density and number of cluster heads to be selected. If in a particular round the distance between any two selected cluster heads is smaller than ξ, they will be considered too close to each other and one of them has to drop its decision of becoming cluster head (CH).

3.2 Maintaining the Optimal Percentage of CHs in Each Round

Though the above proposed improvement in cluster head selection process causes a significant improvement in the performance of probabilistic clustering algorithms, it has a small drawback. Consider the case when a large number of potential CHs (the nodes for which the value of generated random number is less than the threshold value) in a round of a particular epoch are in close proximity then many of them will drop their decision to become CH. In such a situation the number of selected cluster heads would be significantly lesser than the optimal number of cluster heads. This will lead to bigger size clusters and consequently more energy consumption in intra-cluster communication. Also in the last round of the epoch all the nodes that have not become cluster head so far in that particular epoch will have to become cluster head and so chances will be higher for the selection of more cluster heads than optimal number of cluster heads. This will increase the long distance transmission to the sink. To deal with this situation we want to increase the number of potential cluster heads in each round so that the number of selected CHs after dropping some of potential CHs because of ξ-closeness be as near to optimal value as possible. For this, we need to raise the threshold value in Eq. (1) so that more nodes than usual will be eligible to become cluster head.

So as our second proposed improvement another parameter σ introduced, denoting the threshold increment factor. The increment in the threshold value given by Eq. (1) will not be same for each round in an epoch as original threshold value itself increase in each subsequent round of an epoch and becomes equal to 1 in last round but we want to keep σ constant in each round so we can calculate new increased threshold using

$$T_{NEW} \left( n \right) = T\left( n \right) + \sigma \left( {1 - T\left( n \right)} \right)$$
(4)

In our assumption σ will be constant for a particular configuration of network and a particular choice of closeness factor ξ.

3.3 Choosing Closeness Factor ξ and Threshold Increment Factor σ

The values of ξ and σ are very crucial for better performance of clustering algorithms. Closeness Factor ξ depends on the network configuration, i.e. size of the network, number of nodes deployed in the network, density of nodes, and optimal number of cluster heads, while the value of σ depends on the value of ξ as well as on network configuration. We have calculated the optimal values of ξ and σ experimentally by running multiple instances of the algorithm for different combinations of ξ and σ, keeping other parameters of the network constant.

3.4 Integration with the Existing Probabilistic Techniques

The above two proposed improvements integrated with existing probabilistic clustering techniques to improve their energy efficiency. We have used two generic approaches of integration, one that requires little intervention of centralized authority such as base station and other follows a completely a distributed approach.

  1. 1.

    Centralized Approach: In Fig. 1 functioning of centralized approach of integration discussed.

    Fig. 1
    figure 1

    A centralized approach for integrating proposed improvements to existing algorithms

    Here K is the optimal number of CHs that is precalculated using analytical methods as described in [18]. The predefined performance metric can include QoS factors such as quality of link between candidate and base station, congestion, node density around the candidate node, its residual energy etc.

  2. 2.

    Distributed Approach: In distributed approach each node autonomously takes decision of becoming a cluster head or not without intervention of any central authority. The following Fig. 2 explains a distributed approach for integration.

    Fig. 2
    figure 2

    A distributed approach for integrating proposed improvements to existing algorithms

    Here E(i) is the residual energy of node i and c(ξ) is a function whose value depends only on closeness factor ξ. Waiting for \(\frac{c\left( \xi \right)}{E\left( i \right)}\) time ensures that among all the candidate CHs that are in ξ-closeness range of each other only the node with highest residual energy will be chosen as cluster head (CH).

    Both centralized and distributed approaches have their own advantages. Centralized approach offers better QoS implementation mechanism and more tolerant to variance in values of σ and ξ from optimal value. On the other hand distributed approach requires minimal message exchange among nodes for CH election. Here we consider c as a function of ξ only but for QoS implementation we can make c as a function of other QoS parameters with ξ as well.

4 Simulation Results

We conducted a rigorous simulation study to evaluate the performance of our proposed Energy Efficient Probabilistic Clustering Technique (EEPCT). We assume a square network field of dimension 100 m × 100 m with 100 sensors deployed in it and MATLAB is used for the purpose of simulation. Specific parameters for simulation are shown in Table 1. The performance of EEPCT evaluated and compared with some representative probabilistic clustering algorithms such as LEACH [18], HEED [19], SEP [20] and TEEN [22].

Table 1 Simulation Parameters

4.1 Determination of ξ and σ

To determine optimal values of ξ and σ we run our algorithm multiple times for different combinations of ξ and σ for a given configuration of network and choose best values. Table 2 shows energy consumption for different combinations of ξ and σ for two different network configurations.

Table 2 Determination of ξ and σ

We have calculated the optimal value of ξ and σ for two Network configurations. In first configuration 30 sensors deployed in a 50 m × 50 m network field. In second configuration we have taken a 100 m × 100 m network field with 100 sensors deployed in it. We found 15 m and 0.16 to be optimal values for ξ and σ respectively for first configuration. Similarly for second configuration optimal values are 29 m and 0.13 respectively.

4.2 Performance Evaluation

In above figures we have shown the performance of existing probabilistic clustering algorithms in their original form and after the application of our proposed technique. We can see in Figs. 3 and 4 that after applying our proposed technique there is an increase of 43% in stability period of HEED and an overall 66% increment in total lifetime of network in simulated environment.

Fig. 3
figure 3

Number of alive nodes per round

Fig. 4
figure 4

Number of Cluster heads (CHs) per round

Similarly in case of SEP, in Figs. 5 and 6 we can see a considerable increase in the energy efficiency and network life time. We can also observe that after application of our proposed modification there is significant improvement in the no. of clusters formed in each round.

Fig. 5
figure 5

Number of alive nodes per round

Fig. 6
figure 6

Number of Cluster heads (CHs) per round

Similarly in LEACH, there is a significant improvement of in overall lifetime of the network. There is also an increase of around 37% in stability period of Network (Figs. 7, 8).

Fig. 7
figure 7

Number of alive nodes per round

Fig. 8
figure 8

Number of Cluster heads (CHs) per round

5 Conclusion

In this paper we proposed Energy Efficient Probabilistic Clustering Technique (EEPCT) with two changes in existing probabilistic clustering algorithms to ensure more or less optimal number of clusters and well distributed CHs across the network in each round. We also used two ways to apply proposed technique to existing clustering algorithms. Simulation results show that proposed technique obtains considerable improvement and allows a large stable network lifetime compared to LEACH, SEP, and HEED. In this paper we focused on probabilistic clustering algorithms but in future it can be extended to other class of clustering Algorithms.