1 Introduction

Wireless Sensor Network (WSN) is a collection of small and tiny low-powered sensors with sensing, data processing, transmitting, and receiving abilities. These minute sensors collect sensed data from the environment or the application for which they are deployed, transmit, and relay the information to the base station (BS). The base station is equipped with sink nodes that collect and aggregate the information from the sensors and a decision is made therein. WSN is a synergistic coupling of the sensor, network technology, and wireless communication [1]. The nodes are densely and randomly deployed with a battery as their source of energy. It is therefore impractical to recharge the sensor nodes owing to the inaccessibility of the distribution areas [2]. The energy consumption of each node has to be balanced to add to the longevity of the network [2]. So, energy is consumed in various ways, but heavily during transmission once the connection has been established. Consequently, designing an efficient energy routing protocol is indeed necessary to increase the potential lifetime of the network.

Much pieces of research have been published to investigate the success of various WSN topologies which include flat, clustered, chain-based, and tree configurations [3]. Research efforts have been mostly geared towards lessening the energy consumption of the network because energy source constraints are the major drawbacks of WSN. Of all the measures taken to save the energy consumption caused by communication, designing suitable clustering algorithms stands out. In a clustered-based approach, the nodes present the network are organized into clusters based on certain parameters in which one node serves as the cluster head and the other nodes are connected to it. This paper presents an intensive review of both the probabilistic and non-probabilistic clustering used to extend the lifetime of the WSN. The former depends on a prior assigned probability of the desired percentage of CH in the network whereas the latter is primarily deterministic as the CH election is dependent on certain criteria such as node proximity and degree and the received information from the neighboring nodes.

This paper provides the following goals: (1) To facilitate the comprehension of probabilistic and non-probabilistic clustering in WSN; (2) To ease the understanding of the reader by logically itemizing the subject matter and providing the variants of the algorithms; (3) To help the interested researcher identify alternative solutions and select relevant methods or protocols; (4) To highlight the strengths and weaknesses of the algorithms that fall under probabilistic and non-probabilistic clustering in WSN.

1.1 Clustering in Wireless Sensor Network

Clustering is an important term in WSN which connotes the way the nodes present in the network organize themselves into clusters based on certain parameters. It applies to big networks especially ad hoc where the sensors are employed for sensing purposes. If the whole sensor nodes communicate with one and another to transmit the sensed information to their destination, network congestion and collision will set in and eventually render the network less energy efficient due to limited resources. Clustering is one of the solutions that meet the effective resource utilization and load balancing in WSN. In a clustered network, the sensor nodes are arranged cluster-wise and one serves as the CH that oversees the activities of the cluster and the remaining nodes send data to it. The CH is charged with the responsibility of aggregating the data and send it to the base station. High capacity node advertises themselves as the CH and others join at proximity. Instead, each node transmits the data directly to BS over a large distance, it then transmits within the cluster over a short distance. Also, the periodic re-election of CH from residual energy guarantees balanced energy consumption within the cluster. The efficiency of data transmission is increased by clustering by aggregating the data within the cluster.

It is no doubt that energy conservation is the most significant and popular objective of clustering in WSN [4]. Sensor nodes are deployed in several significant quantities (hundreds to thousands) in a given field depending on the application of interest. Clustering increases the scalability of a bigger network by providing layers of the network, where layers are subdivided into clusters. The sensor nodes are basically dispersed in a harsh and unattended environment, and the nodes are equipped with limited energy. Clustering paves the way for fault-tolerance and self-configured characteristic as the fault tolerance is handled through re-clustering. The sensor nodes are deployed in a given area, sense the same data, and forward it to the destination node. Data aggregation is a strategic way of curtailing the transmission of duplicated and repetitive in a network. Clustering ensures that data aggregation is done in the cluster by the CH to lessen the total load in the network. Clustering ensures load balancing in the network by dedicating the data processing for intracluster communication and inter-cluster interaction for data transmission. In some WSN applications, QoS requirements such as high latency is assured through clustering because the data are routed through the CHs, not the nodes. It is common knowledge in the network that collision avoidance is highly necessary because, for each collision, there is a price for retransmission of packet when lost. This will invariably deplete the energy of the network. However, clustering utilizes TDMA during intracluster data processing to the ward-off collision. Clustering also provides a well-stabilized topology. The changes in the topology are conveniently managed, each CH possesses the data like energy and location about its members. Therefore, if a node suddenly ceases from an operation or migrates to another cluster (if a mobile network is considered), the CH proximately reports the changes.

1.2 Challenges for Clustering in Wireless Sensor Network

Many hierarchical clustering algorithms have been presented in researches and deployed for application based on some parameters. The selection is left with some open issues with the need to be addressed especially with the advent of technologies like the internet of things (IoT), Vehicular Ad hoc Network (VANET). As everything will be connected through the internet, sensor nodes will be employed to connect things wirelessly for data transfer. So, security, connectivity, retrieval, and storage of data are some of the challenges facing clustering techniques in WSN as the network operates in an open environment. Some of the key challenges are mentioned below as:

1.2.1 Security

The sensor nodes are deployed openly in a hostile environment, and clusters are created therein. A passive or active attack can be launched in WSN to monitor the communication links illegally, eavesdrops, and destroys the nodes. The clustering protocol ai attacked when the cluster is disturbed by an attacker. In a cluster, an attacker may mislead the cluster by building inconspicuous nodes or destroy some nodes by creating a neighbor with duplicates. Cryptographic mechanisms like encryption and hashing may not be effective for secured transmission of data as a result of limited resources [5]. However, to build a secured clustering in WSN, secure CH election, the secure formation of clusters, secure data aggregation by the CH, and secure transmission of data to the BS are required through secured clustering [6].

1.2.2 Reliability

Increasing network lifetime in WSN is the same as optimizing the power consumption and increasing the reliability of the network. Reliability remains the increase in the probability of the delivered packets. Reliability and power consumption are twin brothers, which means an increase in one necessitates the increase for the other. Decreasing the power consumption negatively retards the reliability of the network. Sleep and multipath are part of the strategies implemented in the protocol stack. The former is suitable for power consumption while the latter is excellent for reliability [7]. It is more or less a tradeoff. However, re-clustering at various intervals is implemented in many algorithms to ensure the reliability of the network and this also results in less energy efficiency. Further improvement should be made in the initial election of the cluster head to adapt the clusters of the network and maintain the connectivity through periodic re-clustering [8].

1.2.3 Quality of Service

A lot of clustering algorithms provide energy efficiency jettisoning QoS share in WSN. The metrics of QoS should be considered in designing the clustering protocol [9]. Localized QoS offer better advantages in terms of scalability, flexibility, and energy efficiency. In some applications and development in hardware technologies, QoS is becoming more familiar [10]. QoS is application- specific and required on end to end real-time and reliable systems. The purpose of QoS clustering is to choose the optimal path from the source node to the sink by meeting limitations such as delay, bandwidth, jitter, and packet loss rate. Low latency is called data packet delay, propagation delay, etc. Bandwidth is also an important metric for QoS.

1.2.4 Limited Energy

WSN consists of tiny devices with low power consumption and small sizes. They are battery operated sensor nodes with limited storage capabilities. The nodes are haphazardly deployed in an irregular manner which would be impractical to recharge the battery after exhaustion [11]. Clustering algorithms are highly energy-efficient and as such are utilized to periodically optimize the cluster formation stage, CH election, intracluster data processing, and inter-cluster communication stage.

1.2.5 Node Degree and Rotation of Cluster Head

Reducing the overhead for cluster formation is highly necessary for clustering algorithms. Optimal clustering is used to minimize the overhead with the election of CH.

The challenging part of the research is searching for the optimal node degree to decide the best cluster size. Clustering helps to minimize energy consumption by generating a desirable distribution of CH. CH rotation also ensures that a node with less degree can forfeit its present round for a better one to prolong the lifetime of the network. Optimum rotating frequency is required to prevent interruption of CH rotation and guard against power drainage of CHs.

Figure 1 represents the classification of clustering into probabilistic and non-probabilistic clustering. Probabilistic clustering is divided into Equal, Unequal, and K-Means while Heuristic-based, Weight-based, Highest connectivity, and Grid-based clustering fall under non-probabilistic clustering.

Fig. 1
figure 1

Classification of probabilistic and non-probabilistic clustering

2 Related Works

The lifetime maximation of WSN which of prime importance and others are the metrics for evaluating the performance of WSN protocols. Many efforts in the scientific literature from different perspectives will be discussed in this section.

LEACH is one of the eminent routing algorithms used. It was the first proposed state of art routing protocol in WSN. As stated earlier, the primary goal is to ensure uniform distribution of energy consumed among nodes by rotation the role of CH could be surrendered from the nodes with higher energy dissipation. It has two operation phases; the setup, and the steady-state phase. Depending on the CHs percentage and the frequency of a particular being the CH, any node can be selected. The CH is chosen in a randomized manner. This suffers some setbacks. Furthermore, the optimal location of CH is a challenge.

It was observed that there were a variety of LEACH protocols that exist. The famous being centralized- LEACH. In the LEACH-C, it is the responsibility of the base station to create the cluster. This is done by receiving the information that has to do with the energy and location of the nodes to determine the number of CH and divide the network into the cluster. In this algorithm, BS calculates the average energy of all the nodes and all the formed clusters to check for those whose energy is above the average energy. A CH will be selected from such nodes with sufficient energy. The algorithm uses the Minimum Spanning Routing Tree approach to select the smallest energy routing path to forward the broadcast message involving the clustering and CH to all nodes. However, the overhead increases during the reselection of the CH. In [12], LEACH-DT is proposed in which the probability of CH is modified by incorporating the distance factor.

PEGASIS was introduced in [13]. Instead of creating a cluster of nodes, it forms a chain of nodes to transfer data between sensor nodes and its neighbors. The chain is created using a greedy algorithm and the chain is managed by the chain head which is also selected randomly for each round among the chain nodes i.e. signal strength is used to keep the information of the closest neighbor. Hybrid Indirect Transmission (HIT) was suggested in [14]. It is a hybrid protocol that combines both LEACH and PEGASIS.

It has one or more clusters. The protocol devices a means in which the cluster member forms a tree turned up in the CH to reduce the energy consumption and prolong the lifetime of the network. There is also a tree rooted at the BS formed by the CH.

Most of the prevailing energy-aware routing protocols suffer unbalanced energy consumption which results in inefficient load balancing and compromised the network lifetime. Adaptive Energy-Aware Cluster-based Routing protocol (AECR) for WSN was presented in [15]. The work was proposed to improve energy conservation and data delivery performance. To avoid random cluster formation, the algorithm generates a balance sized cluster based on node distribution. New conditions are exploited to dynamically shift the role of CH.

Centralized routing also employs optimization methods such as linear programming, ant colony optimization, and other heuristic methods to locate a better route based on global information on energy consumption and topology [16]. Multi-objective Fractional Artificial-Bee Colony (MFABC) algorithm for energy routing in WSN was proposed in [17]. Maximation of the energy of the network and lifetime of nodes by selecting CH was the goal of the work. MFABC is used to control the convergence rate of the ABC algorithm based on the following; distance traveled, energy consumption, and delays to realize its objectives. Optimization is used to select the CH optimally and fractional theory is included to generate the food source.

Recently, some of the energy-efficient protocols are based on equal and unequal size clustering. In [18], the Unequal Clustering Size (UCS) algorithm has been developed. This is to maximize the lifetime of the network. In this algorithm, a two-layer network pattern is suggested and each cluster has the same size. It also allows the collection of a large amount of data from the source nodes. However, it is not suited for the large network as only two hops are employed for the transmission of data. Unequal Cluster-based Routing (UCR) protocol was presented in [19]. The network is divided into clusters of unequal size. The clusters close to the sink node exhibit a smaller size of cluster and size increases as we move away from the sink. The cluster with a smaller size can maintain energy to transmit within the cluster. The problem of this algorithm lies with the randomness in choosing the CH.

3 Probabilistic Clustering

Clustering methods are prevalently used to conserve the energy expended by the nodes due to the transmission of data. To conserve communication bandwidth, clustering helps in limiting the scope of inter-cluster communication among the cluster heads (CHs). This will surely reduce the redundant exchange of information among the nodes as the CHs fuse and aggregate to prevent the presence of duplicate data. In probabilistic clustering, probabilities are initially assigned to the nodes to decide to become the CH at a particular round [20]. The primary or random criteria are considered for the election of CHs and to limit the energy consumption and improve the efficiency of the network, other criteria should be satisfied. From literature, many probabilistic clustering algorithms have been proposed to prolong the longevity of the wireless sensor network. They can be categorized into Equal clustering, Unequal clustering, and K-Means clustering. Some probabilistic clustering algorithms used to enhance the lifetime of WSN will be elaborated.

3.1 Equal Clustering

3.1.1 Low Energy Adaptive Clustering Hierarchy (LEACH)

LEACH is regarded as the first and furthermost prominent clustering algorithm in WSN. It is amenable to a homogenous sensor network. The basic concept is to form clusters based on the strong received signal strength (RSS) received from the head of clusters. This enables the cluster heads (CHs) to be used as gateways to reach the destination [21]. The sole aim of introducing LEACH is to reduce power consumption [22].

LEACH segments the network into clusters and each cluster has a leader that rotates roundly. CH sends data directly to the base station (BS) and uses a data aggregation technique to decrease the energy consumption of the nodes and extend the lifetime of the network. The operation of LEACH is performed in two stages: setup and steady-state phase. The cluster setup stage is the one in which each node decides on contending for CH. The decision depends on the desired percentage of CH in the network and the number of times a node has been a CH. Sequel to this choice, each node is randomly assigned a value between the values (0 and 1). The value is then compared with a threshold as in Eq. (1), if the threshold is greater than the value, the node is elected as the CH. Then, the CH invites the remaining nodes to join it by broadcasting advertising messages, and a cluster is made for the current round. The nodes will then be allowed to transmit data to CH by scheduling time division multiple access (TDMA). In the steady-state stage, data transmission starts and each sensor node sends according to the time slot. It can then be turned off and the CH should always be ON to receive data from the sensor nodes.

$$T\left(si\right)=\left\{\begin{array}{*{20}c}\frac{Pi}{1-Pi(r mod\frac{1}{Pi})}& if \,s(i) \in G \\ 0 & otherwise\end{array}\right.$$
(1)

where (si) is threshold of ith sensing node, i is integer value, Pi is probability for selection of cluster head of ith sensing node, r is present round, G is space of cluster having sensing nodes.

LEACH assumes that every node has adequate radio energy when the communication between the nodes and the CH and CH or BS is in progress which is considered a waste of energy [23]. Also, the CH gathers the data from the sensor nodes and sends the aggregated data to BS through a single-hop communication link [24]. Though it is simple to implement LEACH, but not suitable for heterogeneous networks because the remaining energy of the node is not regarded, thereby rendering it less useful for a bigger and heterogeneous network.

3.1.2 Hybrid Energy Efficient Distributed Clustering (HEED)

HEED was presented by Younis and Fahmy [25]. It is a multi-hop WSN clustering that explicitly considers energy as a provision for energy-efficient routing in WSN. It is different from LEACH in terms of a random selection of CH. In HEED, CH is elected considering two-hybrid parameters. The first parameter depends on the node’s residual energy while the second is the intracluster communication cost of the contending nodes. The elected CH is expected to possess relatively high average energy than member nodes. This will lend credence to the main goal of HEED which is to achieve an even distribution of CHs in the WSN.

Equation (2) dictates the probability that a node becomes CH. The percentage of CH is initially established to assume that each node computes it CHprob of becoming CH a prior.

$$CHprob = CHprob \times \frac{{E_{residual} }}{{E_{\max } }}$$
(2)

where Eresidual is the estimated current energy of node, Emax is the reference maximum energy.

A certain threshold selected inversely proportional to Emax to be less than the value of CHprob to terminate in a particular iteration. Then each node is subjected to go through several iterations until CH is found, else it elects itself as the CH and notifies the neighbors through an announcement message [26]. The nodes double their CHprob value and proceed to the next iteration until the CHprob clocks the value of 1. The sensor node uses two types of status (tentative and final status) to advertise to its neighbors. The node is tentative if CHprob is less than 1 and its status is changed at later iterations to the regular node as it encounters lower cost CH, however, if CHprob is 1, the node is permanently the CH. In joining CH, the node selects the one with the least communication cost and it is the work of CH to forward the gathered data to the sink in a multi-hop manner rather than directly. HEED has the following advantages; wholly clustering method employing two important parameters for selection of CH, low-power level clusters that increase spatial reuse, and high-power levels for inter-cluster communication. In contrast to single-hop communication, multi-hop data transmission between CHs and BS conserve the resources and increases scalability compared to LEACH [27].

However, the pitfalls are as follows: it uses tentative CHs that may not become ultimate CHs provide room for nodes being uncovered, clustering levies significant overhead in the network which gives rise to energy dissipation, and the CHs nearer to the BS are subjected to the hot spot and may die earlier [28]. Some of the variants of HEED are discussed below:

3.1.2.1 Unequal Hybrid Energy-Efficient Distributed Clustering (UHEED)

UHEED is a combination of HEED and unequal clustering [29]. In HEED, there exists the hot spot problem which results in unbalanced energy consumption in unbalanced energy consumption in an equally formed cluster. So, there is a need for an unequal clustering protocol to lessen the problem to prolong the lifetime of the network. This gives birth to UHEED which is a variation of the HEED algorithm. UHEED creates an unequal size of the cluster based on the distance of CH to BS to reduce the quantity of intra- cluster traffic for CH nearer to BS. HEED defines an equal-sized cluster; the unequal size is created by UHEED with the help of the competition radius formula as in Eq. 3.

$$R = \left( {1 - c\left( {\frac{{d_{\max } - d\left( {s_1 ,BS} \right)}}{{d_{\max } - d_{\min } }}} \right)} \right)R_{comp}^0$$
(3)

where R is the competition radius, \({R}_{comp}^{0}\) is the maximum valued competition radius, dmax is the maximum distance between sensing node and base station, dmin is the minimum distance between sensing node and base station, d(s1, BS) distance between sensing node s1 and base station(BS), c is the weighted factor.

3.1.2.2 Rotated Unequal Hybrid Energy-Efficient Distributed Clustering (RUHEED)

RUHEED is used for mitigating the energy-hole problem as well. It is an improvement of UHEED with rotation being introduced [30]. It is characterized by three stages; CH election, cluster formation, and rotation. In the cluster formation phase, it uses energy- efficient unequal clustering. HEED is employed for the CH election and the rotation phase is used to select one of the member nodes as new CH, which possesses the highest residual energy in the absence of election protocol. In the rotation phase, whenever a WSN node completely exhausts its energy, the BS informs all the nodes to perform a new CH election and cluster formation. The main advantage of this protocol is that it reduces the number of stages for the CH election, and the formation of clusters to further reduce the number of control messages [31].

3.1.2.3 Simplified Hybrid Energy- Efficient Distributed Clustering (sHEED) and Enhanced Hybrid Energy- Efficient Distributed Clustering (EsHEED)

sHEED is an optimized form of HEED. It is similar to HEED in terms of CH selection but with less energy consumption and reduced complexity [32]. EsHEED is an improvement of sHEED as it divides the network into two levels. It consists of Superior CH and normal CH. The Superior CH(SCH) is the CH of level 1 and is superior to the CH in the second level. The CH of level 2 transmits the data of all its clusters and the SCH of level 1 aggregate and forward the data to the BS. The total communication cost is incurred by the CH needed to communicate to BS is reduced by the SCH as a result of a decrease in the number in CH required for the exchange of the information. This is due to the effective path selection from each node to BS.

3.1.2.4 Stable Election Protocol (SEP)

In [30], SEP is described as a heterogeneous energy-aware algorithm used to cater to the presence of heterogeneous nodes to prolong the stability period in WSN. This means that the election probability is weighted by the initial energy of a node based on the remaining nodes in the network. According to the remaining energy in each node, CH is elected from the weighted selection probabilities.

By employing parameters like normal nodes (α), a fraction of advanced nodes (m), and additional energy factors, the stability of the clustering process is improved. The advanced nodes become CH more frequently than normal nodes because they possess more energy compared to the latter. Consider a network consisting of n number of sensor nodes with m fraction of advanced nodes equipped with α times more energy than normal nodes. Then each advanced node will have the energy of E0(1 + α) and the total initial energy of the network is given in Eq. (4)

$$n\cdot E_0 \left( {1 - m} \right) + n\cdot E_0 \cdot \left( {1 + \alpha } \right) = n\cdot E_0 \left( {1 + \alpha \cdot m} \right)$$
(4)

where m is the fraction of advanced node, α is the fraction of normal node, n is the fraction of new node, E0 is initial energy of normal node.

Equation (5) represents the weighted probability for the normal node

$$P_{nrm} = \frac{{P_{opt} }}{1 + \alpha m}$$
(5)

where Pnrm is normal node election probability, Popt is optimal probability of node to be elected as cluster head.

Equation (6) represents the weighted probability for the advanced node.

$$P_{adv} = \frac{{P_{opt} }}{1 + \alpha m}(1 + \alpha )$$
(6)

where Padv is advance node election probability.

The threshold parameter is very important as it depends on the probability of the node. It dictates the election of CH because the respective node needs to choose a random value less than the threshold for it to be CH. The threshold values for both normal and advance are calculated as demonstrated in Eqs. (7) and (8)

$$T(S_{nrm} ) = \left\{ {\begin{array}{*{20}c} {\frac{{P_{nrm} }}{{1 - P_{nrm} \left( {rmod\frac{1}{{P_{nrm} }}} \right)}}} & {if\,S_{nrm} \in G} \\ 0 & {otherwise} \\ \end{array} } \right.$$
(7)

where T(Snrm) is threshold of normal node, Pnrm is the probability of normal node, Snrm is normal node.

$$T(S_{adv} ) = \left\{ {\begin{array}{*{20}c} {\frac{{P_{adv} }}{{1 - P_{adv} \left( {r\bmod \frac{1}{{P_{adv} }}} \right)}}} & {if\,S_{adv} \in G} \\ 0 & {otherwise} \\ \end{array} } \right.$$
(8)

where T(Sadv) is threshold of advance node, Padv is the probability of advance node, Sadv is advance node.

3.1.2.5 Variants of SEP

Enhanced Stable Election Protocol (E-SEP)

The goal of SEP is to achieve maximized robust self-configured WSN. In E-SEP, there is an introduction of intermediate nodes in addition to the advanced and normal nodes which make it a three-tier node [33]. Nodes select themselves as CH by considering energy levels providing uniformly distributed energy efficiency among the nodes.

The energy of the intermediate nodes is \(E_0 \left( {1 + \mu } \right)\) and \(\mu = {\raise0.7ex\hbox{$\alpha $} \!\mathord{\left/ {\vphantom {\alpha 2}}\right.\kern-0pt}\!\lower0.7ex\hbox{$2$}}\)

So, the total initial energy improved by the addition of an intermediate node is computed as in Eq. (9)

$$n \cdot E_0 (1 - m - b) + n \cdot E_0 \cdot (1 + \alpha ) + n \cdot b \cdot E_0 \left( {1 + \mu } \right) = n \cdot E_0 \left( {1 + \alpha \cdot m + b \cdot \mu } \right)$$
(9)

where m is the fraction of advanced node, α is the fraction of normal node, n is the fraction of new node, E0 is initial energy of normal node, b is the proportion of intermediate node.

Equation (10) represents the weighted probability for the normal node

$$P_{nrm} = \frac{{P_{opt} }}{1 + \alpha m + b \cdot \mu }$$
(10)

Equation (11) represents the weighted probability for the intermediate node

$$P_{adv} = P_{opt}*\frac{1 + \alpha }{{1 + \alpha m + b \cdot \mu }}$$
(11)

Equation (12) represents the weighted probability for the advanced node

$$P_{adv} = P*\frac{1 + \alpha }{{1 + \alpha m + b \cdot \mu }}$$
(12)

E-SEP is more robust in terms of resource sharing and network lifetime [30, 33].

Zonal Stable Election Protocol (Z-SEP)

In this protocol, some nodes are allowed to communicate their information directly to the BS while others apply the clustering method to transmit data to the sink i.e. direct transmission and transmission through the CH. In SEP, the extra energy of higher-level nodes is under-utilized and therefore Z-SEP is used to achieve minimum dissipation of energy. Because the energy of the nodes is not fully utilized, Z-SEP is employed to divide the network into Zone0, Head Zone1, and Head Zone2 based on the state of energy following the vertical component of the network field.

According to [34], Zone1 lies between 20 < Y ≤ 80, and normal nodes are deployed randomly there, Head Zone1 is in the range 0 < Y ≤ 20, and half of the advanced nodes are randomly deployed there as well, and in Head Zone2, 50 percent of the advanced nodes are randomly deployed in the range of 80 < Y ≤ 100.

Normal nodes communicate directly to the BS because of their energy level and advanced nodes are deployed in Head Zone1 and Head Zone2 as a result of their energy level and send data to BS through clustering and CH is elected from the nodes in Head Zone1 and Head Zone2. The desired weighted probability for the advanced node becoming CH and threshold for the task is as in [30]. The two equations will be used to select the CH and the broadcast message will be sent to the nodes in their zones to send the data being collected and the CH aggregate and forward the data to the BS. Consequently, it will help improve stability and network lifetime.

Enhanced Energy Zonal Stable Election Protocol (EEZSEP)

From [35], the main problem with E-SEP is that data is not conveyed if the threshold is not attained. EEZSEP is used to overcome the problem. EEZSEP is a heterogeneous clustered protocol that holds a three-level tier. Advanced, intermediate, and normal nodes with different energy are used with a consecutive decrease in the energy level.

The network field in EEZSEP is divided into Zone1, Zone2, and Zone3. The nodes are deployed concerning their energy levels. Advanced nodes are deployed in Zone1 and Zone3 while intermediate and normal nodes are deployed in Zone2. To divide the network into three zones, a Zone division algorithm is used and the nodes are deployed into zones based on the level of their energy. Zone1 and Zone3 send their data to the BS through clustering while Zone2, a combination of intermediate and normal nodes is closer to the BS, and because of clustering, intermediate form the CH and normal nodes send their data to the CH. This will extend the lifetime of the network as it increases the stability of the network and throughput [36].

Multi-Zone Stable Election Protocol (MZ-SEP)

EEZSEP is based on a hybrid of multiple triangle zones distribution and SEP protocol. This algorithm produces partitioned triangles and the repartition of CH is possible to prolong the energy efficiency in the network [37]. The parameters used are distribution uniformity of CHs and distance between CH and the members [38]. The main goal is to optimize the energy expended during communication. In MZ- SEP, 33% of the CH secure their positions very close to the BS and the nodes attach themselves to the CH based on proximity. Therefore, the distance between CHs and BS should be less than or equal to do.

In this protocol, zones are created using virtual zones to organize nodes. The zones have the shape of “V” and are inspired by Particle Swarm Optimization (PSO) algorithm [39]. The energy consumption of CH is a function of the distance between the position of BS and the amount of data forwarded to the BS [40].

3.2 Unequal Clustering

The issues of a hot spot in WSN is also prevented by using unequal clustering techniques [41]. Unequal Clustering is utilized for load balancing between CHs. As shown in the figure below, unequal clustering is employed to decrease the size of the cluster nearer to the BS chiefly because the higher the cluster size, the lengthier is the distance between CH and BS. This implies that a smaller cluster has a smaller number of cluster members with less intra cluster traffic which will result in less energy consumption and spend less energy on inter-cluster communication. Similarly, a higher cluster size possesses more members and consume higher energy for intra-cluster communication. In unequal clustering, entire CHs use the same energy so that cluster leader closer to the sink spends an equal amount as that farther from the sink. This helps to alleviate the hot spot problem through load balancing.

Unequal clusters are constructed if the node can determine its competition radius, Rc as in Eq. (2) [29]. Considering the heterogeneous network, residual energy of nodes and distance between nodes are used to calculate the competition radius as in Eq. (13)

$$R_c = \left[ {1 - \alpha \frac{{d_{max} - d(si,Bs)}}{{d_{max} - d_{min} }} - \beta \left( {1 - \frac{E_r }{{E_{max} }}} \right)} \right]R_{max}$$
(13)

3.2.1 Variants of Unequal Clustering Algorithms

3.2.1.1 Probability Driven Unequal Clustering Mechanism for WSN (PRODUCE)

There have been many types of research on hot spot problems caused by heavy relay traffic sent through the CH closest to the BS. PRODUCE is not an exception as it also helps to eliminate the problem. It is a randomized and distributed protocol consisting of unequal clusters of variable sizes. In this case, intracluster communication is the main goal of the clusters with larger cluster sizes while the clusters closer to the BS concentrate more on inter-cluster communication because they have fewer members in the clusters. This is so because, in multi-hop communication, the sensor nodes closer to the sink are overburdened with heavy load and may experience premature death of nodes.

PRODUCE uses unequal clustering to carefully organize the network in unequal sizes using localized probabilities as well as multi- hop routing (using stochastic geometry). The protocol gives privilege to distant clusters to focus on inter-cluster data processing and thus have larger sizes while those closer clusters will be utilized for inter- clustering to eliminate hot spot. This will progress the lifetime of the network and coverage time in the network where the density is greater [42].

3.2.1.2 Energy Driven Unequal Clustering (EDUC)

EDUC represents a hybrid of a distributed unequal clustering and adaptive energy-driven CH rotation method. While the former is used to reduce the energy expended for inter-cluster data processing, the latter achieves balanced consumption in energy among the nodes within the cluster intending to reduce waste of energy. Clusters of unequal sizes are constructed by unequal clustering algorithms using unequal competition ranges. This will make the clusters closer to BS to occupy smaller sizes which conserves the energy for long-distance communication. After the establishment of clusters, then an energy-driven cluster head rotation strategy will be employed to shift the role of CH in a cluster once in the network. Therefore, the energy consumed in the rotation of CH will be reduced to the lowest ebb. EDUC is characterized by two phases; the cluster construction stage and the self-organized data collection stage. The cluster construction stage consists of competition of CH and cluster formation stage. In this regard, each node takes the position of CH once throughout the lifetime of the network and it is selected erratically and for rotation, the energy level of nodes is computed. Then transmission within the cluster is done within the jurisdiction of TDMA scheduling in the self-organized data collection stage and to avoid a collision, direct sequence spread spectrum (DS-SS) [43].

3.2.1.3 Location-Based Unequal Clustering Algorithm (LUCA)

The above algorithms proposed using unequal clustering determine the size of the cluster heuristically as the size of the cluster greatly affects the energy efficiency of WSN. So, LUCA constructs clusters as well with different sizes based on their location information (separation between a cluster and a sink). In LUCA, the size of the cluster is directly proportional to the increase in the distance from the BS. It is a rule of thumb in WSN that from the energy efficiency point of view, a tradeoff between intra and inter-cluster data transmission must be carefully considered to improve the performance of clustering [44], especially in equal clustering.

To eliminate the problem of the hot spot problem, LUCA also creates smaller cluster sizes closer to the BS and larger ones as the location of the CH becomes farther to the BS. After the deployment of the sensor nodes, distance from the sink using a GPS link device is computed by each node and is equipped with a back-off timer to form clusters (self-organized) using a random value generator, rand (0, 1). Nodes receive CH advertisement message and join the cluster based on the distance and choose the closest, else it selects itself as the head of the cluster.

3.2.1.4 Energy Efficient Unequal Clustering (EEUC)

Many of the clustering algorithms select node having more residual energy to be the CH and rotates the CH periodically to extend the network lifetime. There exists a problem of a hot spot in multi-hop when clusters conjoin to forward data to the sink, thereby leaving some areas of the network uncovered. EEUC is also an attempt used to solve the issue. The network is partitioned into clusters of unequal size so that CH nearer to the BS can save some energy for inter-cluster communication. EEUC forms a distributed network as the CH are elected through localized competition with no iteration. The competition range increases as its distance to the BS increases. This will make the clusters closer to the BS to have smaller sizes, utilize lower energy for intracluster communication and the remaining energy will be utilized for inter-cluster communication. However, for multi-hop routing, residual energy and distance of the node to the sink will be used as the parameters to select the adjacent relay nodes. EEUC helps to decrease energy consumption and improve the lifespan of the network compared to HEED and LEACH [45].

3.2.1.5 Energy Balancing Unequal Clustering Protocol (EBUCP)

This is another unequal clustering protocol in which the sensor nodes are grouped into layers and clusters from the assigned probability with respect to the distance to the BS. Each layer is having clusters and more clusters are present in the layer closer to the BS than those distant to it. Energy consumption is assured by making clusters nearer to the sink to have a smaller quantity of nodes so that more energy can be reserved for multi-hop routing. The energy balancing principle is used to distribute clusters among the layers so that balanced energy is dissipated in every layer and the energy consumption in each layer is nearly equal.

It is assumed that that the nodes are deployed in a circular area with the sink situated at the midpoint. Multi-hop data transmission in EBUCP relies on the energy layering balancing algorithm and unequal clustering algorithm. The nodes compete for CH using unequal clustering and CHs are selected in every layer using the tentative CH’s residual energy and its energy is used as a reference for energy consumption. The CH also uses residual energy to select the adjacent relay node [46].

3.2.1.6 Constructing Optimal Clustering Architecture (COCA)

In equal clustering, the cluster heads closer to the sink are tasked to forward much traffic which leads to an energy hole because the nodes are uniformly distributed especially in the multi-hop clustering model. Many works based on unequal clustering have been done to tackle the issue, but scalability and energy avoidance are still opened for research. COCA is a synergy of optimal clustering and distributed protocol for energy-aware CH rotation and routing. It focuses on building up an optimal cluster-based network to improve energy efficiency. The topology is made of partitioned layers (equal-sized units) in which the layers nearer to BS exhibit more clusters (clusters in a single unit have similar size). It uses an approach that the number of clusters in a unit decreases as the distance to BS increases. Rounds in COCA are divided into data transmission and the topology formation phase. The topology formation phase is also divided into CH election stage, cluster construction stage, and inter-cluster relay selection. In the CH election stage, the residual energy of all the nodes is distributed with the neighbors, and the nodes with the maximum are declared as the CH. CHs are chosen as the routing candidates and the one with the maximum residual energy is selected as the final routing CH. The energy consumption in COCA is three times that of UCR [47].

3.3 K-Means Clustering

In [48], an improved method for selecting CH is proposed. K-Means clustering operates by finding the CH with the least distance away from the centroid (Euclidean distance). The whole process is divided into three stages namely: initial clustering, re-clustering, and choosing CH. In the initial clustering, after which the algorithm is executed for forming clusters (LEACH protocol is used). The network is then categorized into k-clusters and member of the cluster joins the cluster according to Euclidean distance. Re-clustering is the next step where the centroid of a cluster is computed. The centroid of a cluster is the node pointing at the center of the cluster and then the center of each node is determined. In the CH election stage, the nodes are assigned ID concerning the center and the node which is nearer to the centroid will take a small number. CH is rotated and the subsequent node in terms of ID will be elected as the new CH. It improves the lifetime of the network but consumes more energy during the periodic reformation of the cluster which creates more network overhead.

4 Non-probabilistic Clustering

Non-probabilistic clustering algorithms are primarily deterministic [49]. The CH election is tied to the criteria as follows; node proximity i.e. node’s degree and connectivity, information collected from the neighbors, remaining energy, transmission power, and mobility to establish a combined weight [20]. The formation of a cluster is based on single and multi-hop communication between nodes and their neighbors. This makes it time-consuming and complex as a more intensive transfer of messages is required. Non- probabilistic clustering is generally categorized into Heuristic, Fuzzy logic, weight, highest connectivity, and grid-based clustering. Heuristic-based clustering provides better distribution of CHs across the network by obtaining the optimal solutions about the best cluster sizes and optimum series of nodes as CHs. Optimization techniques like Genetic Algorithm (GA), Differential Evolution (DE), Ant and Bee optimization (ABO), and Bacterial Foraging Algorithm (BFA) constitute this category and use the parameter of fitness to maximize their objectives. Fuzzy logic-based clustering is also used to elect the best categories of CHs in the network when uncertainty is high. Fuzzy parameters like distance to the sink, residual energy, node degree, and node centrality are used to choose the CH. Weight-based clustering creates the cluster in a distributed manner and the CH is elected non-sporadically. It utilizes the combination of the parameters like mobility, transmission power, node degree, and remaining energy to select the CH. Grid-based clustering divides the network into virtual grids. The selection of CH is performed by the nodes which makes it a better candidate for a heterogeneous network. It has some salient features which help in extending the lifetime of the network. Some of the most non- probabilistic clustering algorithms used to enhance the lifetime of WSN will be elaborated.

4.1 Heuristic Based Clustering

4.1.1 Genetic Algorithm Based Energy Efficient Clustering (GABEEC)

GABEEC is another algorithm that is proposed to optimize the lifetime of WSN. It is similar to that of the LEACH protocol. GA is utilized in rounds to improve the lifetime of the network. It operates in two stages; the set-up phase creates clusters which are kept constant throughout. In each round, static clusters are formed with varying CHs. The round in which the first node dies, the round in which the last node dies, and the cluster distance constitute the parameters of the fitness function. [50].

4.1.2 Clustered WSN Using Fuzzy Logic and GA (CFGA)

It is an efficient protocol that provides balanced energy for intra-cluster data processing and inter-cluster communication. This approach is employed to prevent node which is not capable of being CH. The single-step method is employed for intra-cluster data processing while the multi-step method is used for inter-cluster communication. Initially, each node reads its fuzzy module to check its capability of being the CH, it would then be ready. To communicate with the BS, optimum nodes are selected by using the location of CHs based on the energy consumed and GA [51].

4.1.3 GA Energy Efficient Protocol (GAEEP)

The algorithm uses GA to find the locations and the best number of CHs to minimize the energy dissipated by all the sensor nodes in a communication process. GA is applied to advance the stability period of the network. The operation of GAEEP is divided into rounds characterized by the set-up and steady-state stage. In the set-up phase, location and the optimum number of CH are found by the BS to create the clusters, as a result, cluster members are attached to the CHs. The steady-state is the stage whereby the sensed data are collected in frames by the CHs and then forward the frames to BS. A node located close to the BS transmits data straight to it. CHs utilize CDMA during data transmission to reduce energy consumption and inter-cluster collision. It is employed for both homogenous and heterogeneous networks. [52]

4.1.4 GA Based Energy Efficient Clustering Hierarchy (GAEECH)

Many clustering algorithms have been proposed using GA, but with reduced stability period of the network. GAEECH improves both energy consumption and the stability period of the network. To achieve these, the fitness function is enhanced with better parameters to achieve more balanced clusters. The fitness parameters used include; total energy consumption per single data collection round, CH energy dissipation, the energy consumption of CH, and the standard deviation of energy expended between clusters. Its operation is divided into cluster formation and data collection phases. In the cluster formation phase, all the GA operators are applied on the random initial population up to the end with which the best chromosome representing cluster, and its values for CH and members are selected. The data collection phase is the intracluster data processing stage and the transmission of the aggregated data to the BS [53].

4.1.5 Energy Efficient Clustering Using GA and Mobility

In this approach, the mobility characteristics of the sensor node are utilized to optimize the communication distance between the CH and BS. The mobility feature is used to reconstruct CHs in WSN. GA is applied to search for the optimum locations of the CHs that are calculated in each round. For every round, GA is used to determine the location of the CH. New locations for the CHs are added in the set-up stage (LEACH). The protocol outperforms LEACH in terms of average remaining energy and lifetime of the network [54].

4.2 Fuzzy Logic-Based Clustering

4.2.1 Fuzzy-Logic Based Distributed Energy Efficient Clustering (FDEEC)

In FDEEC, the CH election depends on the following input parameters; residual energy, nodes degree, and the neighbor’s residual energy. The input parameters are collected and then transformed into Fuzzy Linguistic variables to carry-out fuzzy logic analysis. The fuzzy logic system is used to calculate the chance of the node becoming the CH [55].

4.2.2 Energy-Aware Distributed Clustering Using Fuzzy Logic (ECPF)

This algorithm is an energy-aware clustering protocol that is a synergy of non-probabilistic CH election, Fuzzy-logic, and on-demand clustering. This will enable the network to achieve a longer life because of the remaining energy of the node. Its operation is carried out in rounds and each round is characterized by a set-up and steady-state phase. The CH is selected in a distributed manner (for local information), the choice depends on the remaining energy of the node (i.e. it is proportional to delay). The node with the earliest delay becomes the tentative CH, and in the subsequent round, the final CH is chosen on a cost-based. The fuzzy logic is applied for checking the fitness (cost of the node) to become the final CH. The parameters used to estimate the costs are the node centrality and node degree [56].

4.2.3 Hierarchical Clustering Using Fuzzy Logic and LEACH

In [57], clustering is achieved based on LEACH in addition to Fuzzy logic. It achieves evenly distribution of load to upsurge the life expectancy of the network. The residual energy expected of sensor node and node residual energy is the parameters used to select the CHs. The frame number of each is first computed and next is to calculate the expected residual energy which is the difference between residual energy and expected consumed energy of the node to be CH. The output serves as the input to the fuzzy system using the if- then rule. The node will higher probability to be CH if the residual energy is higher.

4.3 Weight-Based Clustering

It is a non-probabilistic clustering structure in which a cluster is formed in a distributed manner and the CH is elected non-sporadically. It is invoked when there is a drop in the connection between the member node and the CH. It utilizes the combination of the parameters like mobility, node degree, remaining energy, and transmission power to select CH. It provides load balancing by reducing members in a particular cluster.

4.3.1 Multi-hop Routing Protocol with Unequal Clustering (MRPUC)

This protocol is an energy-aware clustering approach used to conserve the energy expended by the nodes in WSN. It is a distributed energy-efficient approach as each node gathers correlative data of its neighbors and the node with the highest is elected as the CH. MRPUC uses some measures to recompense the energy of the sensor nodes [58]. Firstly, nodes that have maximum residual energy are nominated the CH and the clusters closer to the sink possess smaller cluster sizes to conserve energy for data processing with a cluster which will later complement inter-cluster routing. Secondly, regular nodes use not only the proximity to CH but also the residual energy of CH to join the cluster. Thirdly, the CH chooses relay nodes as the nodes with maximum residual energy and minimum energy consumption to extend the lifetime of the network and this is achieved by constructing an inter-cluster routing tree as the backbone network, then data is transmitted efficiently through the multi-hop. The energy efficiency outperforms those of HEED and COCA.

4.3.2 Energy-Aware Distributed Unequal Clustering Protocol (EADUC)

This is also another energy-aware distributed unequal clustering mechanism used in the heterogeneous network. Competition ranges (radii) between nodes and neighbors are used to create uneven cluster sizes. The ratio between the average residual energy of the neighboring node and residual energy of the node is calculated. CHs are elected on the calculated parameter and the size of the clusters increases with increasing from the BS. So, clusters closer to the BS are characterized by smaller sizes which help to recompense the energy consumption and improve the life expectancy of the network. EADUC operates in rounds and each round is divided into the set-up and data transmission just as LEACH. CH competition, collection of neighbor node information, and the cluster formation phase form set-up phase while data transmission phase has to do with data processing and the multi-hop routing based on the constructed routing tree [6]. A threshold is introduced for inter-cluster communication and CH is allowed to utilize a single-hop when the distance to BS is not greater than the threshold value, otherwise, relay nodes with higher residual energy are used. From Eq. (2), α and β dictates the effect the residual energy and the distance to the sink on the cluster size and Rmax represents the number of generated CH binding with α and β. In EADUC, there are no uncovered points. It, therefore, prolongs the lifetime of the network significantly.

4.3.3 Energy Balancing Unequal Clustering Approach for Gradient-based routing (EBCAG)

EBCAG is an effective unequal clustering method which recompenses energy consumption in WSN. It combines both unequal clustering and inter-cluster communication based on gradient. It creates clusters of uneven sizes by grouping the nodes, where each node is characterized by a gradient value which represents the minimum hop count to the BS. The size of the cluster is dependent on the gradient value of CH and the gathered data in the cluster is forwarded to the sink based on gradient descent. Initially, a gradient value is created for each sensor node from the least hop count and the network is then organized into clusters of unequal sizes. Based on the predefined threshold, tentative cluster head (TCHs) are randomly selected with probability, T, and the TCH become the final CH if it has the highest residual energy. The CH, through unequal cluster radius, collects the data from member nodes, forwards the aggregated to the BS through gradient descent-based routing. Concerning energy efficiency, EBCAG outperforms HEED, and EEUC. [59]

4.4 Highest Connectivity-Based Clustering

This is another category of the non-probabilistic clustering algorithm. It is a form of multi-level cluster hierarchy that is distributed in nature [18]. It exhibits Tree Discovery and Cluster Formulation stages. In this protocol, each node sends the number of its neighboring nodes to the nodes in the network and the node’s connectivity is put into consideration. With this, the node that bears the highest connectivity will be selected as CH, but the lowest connectivity is used in the case of a tie. As such the node already selected surrender being the coordinator of the cluster.

4.5 Grid-Based Clustering

4.5.1 Grid-Clustering Routing Protocol (GROUP)

The author in [60] proposed an efficient and scalable packet routing for large scale WSN. It is considered an energy-efficient routing algorithm that is application dependent i.e. detecting forest fire in real-time. In this protocol, a cluster grid is created in a proactive, random, and dynamical way. The nodes are dispersed in a field and they dynamically organize themselves into clusters at a particular time and each cluster is governed by CH. All the CHs combine to form a virtual grid. The sink sends the query data to all the nodes through the nodes and the member nodes send their data through their respective CH to BS. In the cluster grid construction stage, the primary sink is first elected, which is closer to the center of the network, starts the cluster grid construction by broadcasting Grid Seed election command to all nodes to select GS from its neighbor. The node with the maximum residual value becomes the GS and if two nodes with the same residual energy are present, the one with the earlier reply packet will be elected. The GS sends CH-election packet to all the nodes, the neighbors received the message in a distributed manner to compete for CH about the crossing point. The closer to the crossing point, the higher the chance of becoming the CH. Nodes join the CH as the member if their distance apart is 0.7R. The data forwarding stage is intracluster communication. The CH performs the data aggregation on the received data to save energy. The failure recovery stage is the phase in which a new CH is selected when CH failed and the new CH broadcasts CH-recovery packet to locate its upstream and downstream CHs. It is more scalable than LEACH. It also outperforms with an increase in the number of sensor nodes.

4.5.2 Grid-Based Reliable Routing (GBRR)

According to [61], GBRR is a grid-based routing protocol achieved from square grids to form virtual clusters. It combines both clustering and grid-based routing features to improve the lifetime of the WSN. Besides, a greedy algorithm is introduced to improve the reliability. GBRR divides WSN into grids with an equal square shape so that a grid may bear zero or many nodes. A cluster may be represented by a grid or some grids. The active node is elected to be the head of the cluster. Effective paths within and outside the cluster are calculated to lessen the intra and inter-cluster communication so that the source node will be able to communicate directly to the BS bypassing the CH. The area covered by the cluster will be large if several grids represent the cluster. The node that transmits to the BS may deplete its energy faster due to energy consumption. However, its results when compared with LEACH, EADC, and EADUC show its ability to improve the QoS of WSN.

4.5.3 Grid Sectoring (GS)

In a gird clustering [62], the field is partitioned into grids of equal size and is then further simplified into small sizes whereby each signifies a cluster. This lingers until the best number of clusters is realized (preferably 5% of the total number of nodes). This is done to improve load balancing in the WSN. The node closer to the center is selected as the CH and the number of nodes per cluster is not constant which may result in isolated nodes.

4.5.4 Grid-Based Data Aggregation Scheme (CBDAS)

This is another grid-based clustering protocol in WSN. The sensor network is apportioned into grids of 2-D cells having a head nominated by the BS because of its highest residual energy. The cell heads are connected to create a cyclic chain to provide a bidirectional transfer of the gathered data. Ordinary nodes present in the network transfer their sensed data to the cell head, then the cell head must aggregate its data with the gathered data and forward it to the BS. For inter-cluster data transfer, the gathered data in each round floe from one node to the other through the chain, and the cell head transmits to BS through cycle leader. In this protocol, each of the nodes aggregates its data and moves from node to nodes to maximize the lifetime of the network. The greatest setback of this algorithm is that the cycle leader can be located far from BS, which may lead to the early death of nodes [63].

4.5.5 Grid-Based Hybrid Network Deployment (GHND)

GHND is another grid-based clustering that provides better load balancing and energy efficiency. In this algorithm, the field network is divided equally into virtually squared size grids, where a grid connotes a zone. It is centralized as the BS dictates the grid formation and selection of CH. The protocol is based on merge and split technology across the grids to balance the energy. The lower density zones are merged with the adjacent zones and the higher ones are split into subzones. The node that has the maximum connectivity is selected as CH in all zones. The results as in [64] buttresses that the algorithm performs better than LEACH, PEGASIS, and CBDAS in terms of stability and lifetime.

4.6 Dynamic and Balanced Load Based Clustering

4.6.1 Optimized Energy Clustering Routing Protocol-WL (OCRP-WL)

OCRP-WL is the load and task balancing clustering routing algorithm. By using inter and intra clustering load balancing minimizing the energy requirement. This protocol is based on heavy and complex load. Heavy load consisting several tasks and resources. Clusterization based on available resources for completing all tasks [65].

4.6.2 Energy-Efficient Cluster-Based Dynamic Routes Adjustment Approach (ECBDR)

This approach is based on dynamic route for load management. Different routes having different load. For maximize the energy saving sensing nodes and base station connectivity follow the dynamic approach [66]. A qualitative analysis also proving the comparisons among several inter and intra clusterization protocols related to dynamic loads [67]

4.6.3 Particle Swarm Optimization Based Clustering Algorithm With Mobile Sink for WSNs (PSO-MS)

Dynamic high node density nearby base station is preferable to have the cluster head. Dense cluster distance matter regard communication cost.. [68]

4.7 Distribution and Prediction Based Clustering

Load look a heading in wireless sensor network is very critical for smart clusterization. [69] Regular collection and aggregation of huge data increasing the traffic in network. Distribution and prediction based algorithms proving the appropriate solutions for management of this futuristic incoming load [70,71,72].

5 Comparison and Discussion

All the aforementioned algorithms whether probabilistic or non-probabilistic are employed towards reducing the energy consumption in WSN. Energy efficiency is one of the most principal goals in WSN that allows longer network efficacy and saves energy cost for both intra and inter-cluster communication. Table 1 demonstrates the comparison of the variants of HEED. The algorithms are compared in terms of energy efficiency, cluster stability, scalability, and speed of convergence. From the Table 1, it is observed that HEED has high cluster stability, moderate scalability, and reasonable energy efficiency. This is chiefly because the selected CH is expected to have relatively higher energy than the member nodes. However, in terms of energy efficiency, cluster stability, and scalability, UHEED, RUHEED, and SHEED tend to be higher than HEED and RUHEED is the best algorithm to converge fast because it has a reduced number of CH and controls messages. Table 2 is the table of comparison of both the probabilistic and non-probabilistic clustering in terms of cluster size and count, CH election, communication type, network type node type, and energy efficiency. The table contains the comparison of variants of weight-based clustering (non-probabilistic), unequal clustering, grid-based clustering, heuristic-based clustering, fuzzy-logic based clustering, and highest connectivity in terms of cluster size and count, CH election, communication type, node type, and energy efficiency. From the table under the category of weight-based clustering, it is observed that EADUC is the only algorithm that is amenable to a heterogeneous network and possesses high energy efficiency.

Table 1 Comparison of variants of HEED
Table 2 Comparison of probabilistic and non-probabilistic clustering

6 Conclusion

With the rising increase in the interest of WSN over the past few years, it is no doubt that clustering routing protocols have been the central core of research in WSN. This paper has presented a comprehensive highlight of both the probabilistic and non-probabilistic clustering algorithms used in WSN. It has also analyzed and compared the variants of the algorithms that fall under the topic based on the performance metrics such as energy efficiency, load balancing, cluster stability, scalability, the complexity of the algorithms, etc.