1 Introduction

Wireless Sensor Network (WSN) consists of hundreds or thousands of tiny battery operated electro mechanical devices known as sensors usually deployed in a remote geographical area (sensing field). A sensor can detect some type of inputs such as temperature, light, gas, pressure, motion, moisture, proximity, or some other phenomena from the physical environment (Akyildiz et al. 2002). These physical conditions are represented as signals, which is the output of sensor. These are converted into human-readable form at the sensing area and transmitted over a network to data collecting centers for further processing. Some drawbacks of WSN are limited bandwidth, less memory, short communication range, limited processing power, and limited battery capacity (Yick et al. 2008). The WSN architecture is mostly unstructured (Lee and Lee 2012; Akshay et al. 2010) and consists of set of homogeneous or heterogeneous nodes, Data Collectors (DC), and a Base Station (BS). The BS is also termed as Sink. Mobility can be incorporated in any of these three components. Mobility of sensor nodes, Mobile Data Collectors (MDC) or Sink nodes makes the network topology management much complicated to handle with. When a node senses a change in environmental condition, it may detect and record the change. The collected information is forwarded to the BS/Sink node through single hop or multi-hop communication mechanisms (Sujeethnanda et al. 2013). WSNs have high acceptability in many application areas. Some of them are listed below (Xu 2002; Mainwaring et al. 2002; Milenković et al. 2006; Akyildiz et al. 2007; Othman and Shazali 2012; Ramson and Moni 2017).

  • Agricultural monitoring

  • Habitat monitoring

  • Home intelligence

  • Forest monitoring

  • Health monitoring

  • Vehicular network monitoring

  • Under water monitoring

  • Military applications

  • Natural disaster relief

  • Seismic monitoring

After deployment, the sensor nodes behaviour can be static or dynamic. Static WSNs are easy to handle since network topology won’t change after deployment. The dynamic nature that allows the nodes to move within a geographical area leads to the formation of Mobile WSN (MWSN) (Ma et al. 2008). Routing is used to choose the most efficient path (in terms of delay, traffic, etc) to forward the data to the central computing station (BS) (Al-Karaki and Kamal 2004; Chang and Tassiulas 2004). The long routes can increase network delay while short routes can always cause heavy traffic and energy drain at intermediate nodes. Designing a best cluster based routing protocol will help to maximize network lifetime, efficient data aggregation with load balancing, reduce routing delay, fault tolerance, etc. Mobility can also be incorporated in the data collection nodes known as Mobile Sink (MS).

1.1 Challenges in MWSN

The mobility of nodes in the sensing field may create the following challenges in data aggregation and forwarding (Singh and Sharma 2015; Patil and Vijayakumar 2016).

  • Inconsistent neighborhood information

  • Frequent link failure increases the packet loss rate

  • Route change may result in packet delivery delay

  • Additional handover time to adapt topology changes

  • Communication complexity to join the network

  • Mobility increases packet collision

  • Reduction in network lifetime due to energy depletion

  • Fault-tolerance to handle link or node failures.

1.2 Motivation

Most of the real world applications nowadays demand for sensing devices to collect data from the environment. The sensor networks with stationary nodes are easy to manage and data forwarding can be done with the help of predefined routes. MWSN plays an important role for applications such as habitat monitoring, underwater navigation, health monitoring, vehicular networks, etc. The mobility is caused by the movement of objects or persons to which the sensor is attached. The challenges described in Sect. 1.1 becomes so crucial when the number of devices or nodes increases. A centralized approach for taking routing decision and data forwarding will not be sufficient to manage these issues. Distributed clustering is a mechanism to address the scalability problem in MWSN which improves energy consumption, PDR, network lifetime, and routing delay.

1.3 Contribution of this paper

The main contributions in this survey paper is given below.

  • Summary of existing surveys in the field of WSN is given in Sect. 2.

  • Discussed some of the distributed clustering algorithms (probabilistic and deterministic) for MWSN in Sect. 3.

  • Performance metrics used for the evaluation of routing protocols in MWSN are summarized in Sect. 4.

  • Analyzed the algorithms in the following three categories in Sect. 5.

    • CH selection process, results, and limitations

    • Performance metrics

    • Simulation parameters

  • In Sect. 6, the research challenges and future directions in cluster based routing of MWSN are discussed.

1.4 Summary

This section gives an overview of WSN and its application areas. The need of MWSN and the challenges in such networks are also described. The motivation behind this survey paper is given in Sect. 1.2. Finally the contributions in this paper are listed out in Sect. 1.3.

2 Related surveys

There are many survey papers that deals with routing protocols in WSN/MWSN. This section gives a brief description of existing surveys along with the advantages and limitations of each one.

2.1 Existing surveys on MWSN

The survey papers referred are presented in Table 1.

Table 1 Existing surveys on MWSN

2.2 Summary

Most of the papers mentioned here describes routing protocols for WSN. Cluster based routing in MWSN is described only in Angel and Raj (2017), Sabor et al. (2017) and Sara and Sridharan (2014). Recent algorithms are not available in Sara and Sridharan since it was published in 2014. Angel et al. summarizes only few works and the none of the performance metrics are evaluated properly. There is a detailed explanation of state-of-the art routing methods in MWSN in the paper published by sabor et al. in 2017. Energy Efficiency, network lifetime, delay, and scalability are analyzed. The PDR, bandwidth utilization, routing overhead, etc are not considered for evaluation in Sabor et al. (2017).

3 Distributed clustering and state-of-the-art algorithms in MWSN

This section gives an overview of clustering technique in WSN and reviews some of the existing clustering methods for MWSN in the literature.

3.1 Distributed clustering: the process and factors influencing

In non-cluster based routing protocols, all the deployed nodes will directly communicate with the BS. One issue with this approach is data redundancy at BS since nearby nodes sense and forward same data. Another problem is that energy constrained nodes have to send data to far away BS, which cause energy depletion at nodes and thereby reducing network lifetime. Clustering is an approach used to solve the above issues. The longevity is a term used to define the network lifetime of the MWSN. Dividing the nodes into minimal non-overlapping groups called clusters will be helpful for increasing the network lifetime. This grouping can be done based on geographical area, the transmission range of sensors, application domain, etc. The number of clusters can be fixed or variable. They can be of same size or different size (Fig. 1).

Fig. 1
figure 1

Clustering and non-clustering based data forwarding in WSN

The main design goal is to form minimum number of clusters that gives maximum connectivity and coverage (Tian and Georganas 2005; Ghosh and Das 2008). In a network, the existence of communication link (wired or wireless) between two nodes can be direct (single-hop) or indirect (multi-hop). Connectivity and coverage can be improved using clustering process. There is a leader in each group called Cluster Head (CH) and other nodes in the group or cluster are known as Cluster Members (CM). The selection of CH can be done in three different ways. CH node can be manually selected before deployment (preset), can be selected randomly, or parameter based (depending on residual energy, distance to BS, mobility etc.) selection also possible (Akkaya and Younis 2005).

It is the duty of the CH to aggregate and merge the data collected from its member nodes and pass it to the BS. The selection of path and forwarding of data is distributed among CHs. In such situations, CHs may act as relay nodes, they can control data traffic through time synchronization to achieve high packet delivery ratio. This distributed clustering process can be Proactive, Reactive, or Hybrid. In Proactive or table driven routing protocols, each node maintains a routing table. This will be updated periodically to note down the changes in network topology. The Reactive (on-demand) routing protocols search for the necessary path only when it want to transmit something. Hybrid routing protocols inherits the best features of proactive and on-demand methods. The cluster head selection can be static or dynamic. If the current network scenarios are taken into consideration, it is known as dynamic, otherwise static.

3.2 Distributed cluster based algorithms in MWSN

In this survey, classification is based on the CH selection approach that uses three different paradigms. They are probabilistic, deterministic or preset algorithms (Afsar and Tayarani-N 2014). The probabilistic algorithms are categorized in to random or hybrid. In random selection of cluster heads, a threshold for a node ’n’ is calculated using the following expression (1).

$$\begin{aligned} T(n)= \begin{array}{ll} \frac{C}{1 - C \times r\, mod\, \frac{1}{C}}\;, &{} if\,\, n \in N\\ 0\;, &{} \text {otherwise}, \end{array} \end{aligned}$$
(1)

where C is the percentage of CHs (selected based on coverage and connectivity), r is the present round, and N is the deployed nodes that have not been selected as CHs in the last \(\frac{1}{C}\) rounds. This threshold is compared with a random number [between 0 and 1] generated. Few nodes whose random number less than the threshold will be selected as cluster-head for the next round.

In the hybrid approach, the threshold is calculated using the method mentioned above along with some other features such as residual energy, distance to BS, etc. Deterministic algorithms use more concrete metrics to elect CHs. The four sub classes are Compound, Weight based, Heuristic and Fuzzy based methods. The compound method uses various metrics such as node degree, remaining energy, distance to sink, mobility factor, connection establishment time, link conditions etc. to elect a CH.

To produce balanced clusters, weight-based protocols combine metrics in to a weight. Heuristic approaches uses various techniques such as Particle Swarm Optimization (PSO) (Kuila and Jana 2014), Genetic Algorithms (GA) (Elhoseny et al. 2015), Ant Colony or Bee Colony Optimizations, Multi Objective Immune Algorithms (MOIA) (Sabor et al. 2018), etc. These techniques use different parameters in their fitness function to optimize their objective function. Fuzzy based algorithms (Nayak and Devulapalli 2016) are used to locate the optimal CHs by defining fuzzy rules based on various input parameters. In the preset method, CH is selected before node deployment. But this approach is not suitable for WSN where nodes are mobile in nature. Therefore, preset algorithms are application specific and not considered for MWSN.

Most of the clustering approaches in WSN is based on Low Energy Adaptive Clustering Hierarchy (LEACH) for static networks proposed by Heinzelman et al. (2000). In order to distribute energy almost equally among all sensors, cluster formation and cluster-head selection are done in an adaptive round robin fashion. The approach used is random probabilistic clustering. At any given time sensors themselves will be elected as CHs with some probability and their status is broadcasted to other nodes in the network. Nodes can join a CH that needs minimum communication energy. After formation of clusters, CHs will allot a time slot for its member nodes. CMs will work in Time Division Multiple Access (TDMA) schedule during which they can turn on their radio and communicate. CH will then aggregate or fuse the received data and send to central node (BS) through direct or indirect communication.

Fig. 2
figure 2

Classification of cluster based routing protocols in MWSN

Figure 2 shows the classification of protocols surveyed based on the clustering approach used. The routing of these protocols in the mobility based environment is briefly described below.

3.2.1 LEACH - M [2006]

In this LEACH-Mobile (LEACH-M) protocol (Kim and Chung 2006), Radio transmission range is limited to a specific threshold. The set-up phase uses the distributed cluster formation algorithm to organize nodes into clusters (hierarchical clustering) and to choose the Cluster Head (CH). In the steady state phase, all cluster member nodes will receive a request message to send data. Then each node starts sending data to CHs in their allotted TDMA time slot. After sending data till the next slot, member nodes will be in sleep mode. This will reduce the energy consumption considerably in non cluster head nodes. After some time, CH will fuse the data into signals. Due to the mobility of nodes or energy depletion of CH, there is a chance of data loss at nodes. If the data request message is not received by a node in two successive TDMA slots, it will transmit a request message to join a new cluster and enter into setup phase again. When a node joins or leaves a cluster, the corresponding CHs will update its TDMA schedule, and it will be broadcasted to cluster member nodes. CH will modify the membership lists also.

3.2.2 LEACH-ME [2008]

This is an enhancement of LEACH-M protocol where Cluster Head (CH) selection in the set-up phase uses the concept of remoteness (LEACH-ME) (Kumar et al. 2008). It is a measure of relative mobility with respect to time and immediate neighbors. In addition to the TDMA slot allotted for individual nodes for data transfer, a common TDMA slot is given to the nodes in which all of them will be active and can send its ID with a timestamp to their neighbors. Then each node will calculate its remoteness value with respect to their neighbors and the same will be broadcasted. From the received remoteness value, the node with minimum mobility factor and having high energy (Energy \(\ge \) threshold) is elected as CH. The data transfer from a node to the CH will be done in the steady state phase. This will happen only when the node receives a REQ_data message from CH. Due to the mobility of nodes, the nodes may not receive this message. To ensure this, CH will wait for one more TDMA slot. If no reply is received from the mobile node, that TDMA slot will be assigned to a new node. The mobile node will broadcast a join_REQ and will become part of a new cluster. Here the radio channel is assumed to be symmetric and first order radio channel is used.

3.2.3 CBR-Mobile [2010]

The CBR-Mobile (Awwad et al. 2010) is mobility supporting WSN protocol for both network layer and Medium Access Control (MAC) layer. This is a cross-layer design approach which uses a cluster based routing protocol for the network layer and in the MAC layer, a schedule based contention aware protocol. The goal is to improve Packet Delivery Ratio (PDR) and supporting mobility. CH is selected randomly and nodes may join a cluster based on the Strength of the Received Signal (RSS) from various CHs. CH prepares a TDMA schedule for each member nodes along with some free slots. Each schedule based timeslot has an original owner (OW) and an alternate owner (AW) to handle mobility. When a data_REQ message is received from a CH, if data is ready, OW adjust its transmission power to minimum and send the data. If no data available for OW, the slot is allotted to AW or to a new node. In the contention time slot, a contention message is announced by CH. If there is any new node that waits for joining may respond with a join message. If a mobile sensor node fails to join the new cluster in this slot, then it has to wait for the next contention period. When a mobile node from another cluster did not win any of the contention time slot, they will be given a chance to contend in the free slot. Now the CH will announce the new TDMA schedule with newly joined members. This method is known as an adaptive scheduling algorithm.

3.2.4 MBC [2011]

In 2011, Deng et al. proposed a Mobility Based Clustering (MBC) Protocol to handle mobile nodes in wireless sensor network. A graph is used to represent the network where nodes are vertices and wireless links are edges. A node’s condition is defined by its location, velocity, the direction of movement, and transmission range. The mobility model used is random way point with a symmetric radio channel.

In order to deal with the mobility of CHs, a threshold is calculated by multiplying Eq. (1) with the factors such as remaining energy and sensor nodes speed. This threshold is used for CH selection in the setup phase which is same as mentioned in the work by Heinzelman et al. (2000). The CH will advertise its selection with broadcast message to its neighbors. On receiving this, each sensor node will calculate a suitability value related to a CH based on the predicted connection time, remaining energy, degree of CH, and distance to CH. The nodes which are not selected as CHs, will send a registration message to the CH with larger suitability value. The scheduled time slot for its members is generated and assigned in the ascending order of the connection establishment time by the CHs. The CH will remove a node’s membership if it did not receive data in the assigned time slot. This free slot is allotted to a newly joined node and the new TDMA schedule will be broadcasted. The MBC protocol shows performance improvement when compared with CBR protocol and LEACH-M in terms of successfully received packets for varying number and speed of mobile nodes. The energy consumption and number of control packets required is also less when compared to above two protocols.

3.2.5 CIDT [2014]

A data gathering algorithm in Dense MWSN which is energy efficient is termed as Cluster Independent Data Collection Tree (CIDT) proposed by Velmani and Kaarthick (2014). This method make use of tree topology for cluster formation. The CH and Data Collection Node (DCN) are selected by considering the remaining energy of nodes, its mobility factor, RSS, connection time, and size of the cluster. The setup phase finds the optimal path to sink node by performing cluster head selection, cluster formation, and Tree construction for data collection. The sensed data is transmitted to CHs. The DCN from the lower level in the tree will forward the data collected from CHs to the upper level DCN and finally it reaches the sink. The CIDT minimizes the end-to-end delay by choosing the optimal DCT for data aggregation. The average energy consumption also reduced in this case. When compared to LEACH , HEED (Younis and Fahmy 2004) and MBC protocols, the Packet Delivery Ratio and throughput is increased in this algorithm.

3.2.6 VELCT [2015]

This is a enhancement to CIDT protocol proposed by Velmani and Kaarthick (2015), for data collection in MWSNs, called a Velocity Energy-efficient and Link-aware Cluster Tree (VELCT) scheme . The drawbacks in CIDT protocol is minimized in VELCT by constructing the location based DCT. The position of CHs in a different areas is considered for tree construction. The setup phase creates the topologies for Intra Cluster and DCT communication by selecting CH, forming the cluster, and constructing the data Collection Tree. The calculated threshold is used to choose the better CH. Sink node choose a particular DCN based on the communication range of nodes, its connection time, and RSS. The lowest level in DCT is the sensor nodes which collects data from surroundings and forwards to CH. The data packets collected by the CHs will be forwarded to next level DCNs and they aggregates the received data packets. DCN forwards the fused data to the next higher level until it reaches the BS.

3.2.7 E-MBC [2016]

An enhancement to MBC protocol called Enhanced Mobility Based Clustering Protocol for WSNs (known as E-MBC) was designed by Sahi and Khandnor (2016). The modification is, a special packet will be sent to CH by CM to inform that the member node is still alive.This happens when there is no data to be forwarded to CH in its assigned TDMA slot. This will help to identify the failure nodes. The network reliability is increased because of this fault tolerant mechanism. Energy consumption due to this packet overhead in WSN is very low since the special packet size is too small.

3.2.8 LEACH -MF [2017]

Fig. 3
figure 3

Fuzzy inference system for CH selection

Another approach to increase the lifetime of network and to minimize the packet loss was proposed in the paper titled An Enhanced Hierarchical Clustering Approach for Mobile Sensor Networks using Fuzzy Inference Systems (LEACH-MF) (Lee and Teng 2017). This Fuzzy Inference System (FIS) takes three parameters (balance energy, speed, and pause time of a node) as input and produces a probability value. A node will be selected as CH based on this probability. Each of these input parameters can take five possible linguistic values from the fuzzy set and after applying fuzzy inference rule generated using heuristics, it produces one out of possible nine linguistic outcomes as output (called chance). In order to convert the linguistic output (chance variable) to a numerical value that represents the probability, a Center of Area (CoA) defuzzification technique is used (Fig. 3). During the CH selection phase, if the random number, r generated (0 \(\le \) r \(\le \) 1) by the node is higher than the generated threshold, the node becomes a candidate CH. These nodes will calculate their chance values and broadcasts. A node becomes CH, if the chance value of a node is greater than every other value it receives. When a node advertises its status as CH, non-CH nodes select the closest candidate CH as CH and sends JOIN_REQ. After the clustering process, CH will generate the TDMA slots and will be sent to its members. A node will transmit DATA in their allotted slot and CH will fuse the received data and forward to the final destination called BS.

3.2.9 ARBIC [2017]

A clustering mechanism based on an Evolutionary Algorithm (EA) called Adjustable Range Based Immune Hierarchy Clustering Protocol handling mobility of nodes in WSN (ARBIC) was proposed by Sabor et al. (2018). It uses Multi Objective Immunization Algorithm (MOIA) to find the best CH. There are two phases: Cluster building and Data transmission phases. During the first step, the BS broadcast a “HELLO” message to all nodes and they will forward their basic information as reply. This message includes node ID, its location, initial energy, mobility factor (MF) etc. Based on this received information, BS will divide the sensing field into a number of cells and uses MOIA to find the best CH by considering the communication cost, residual energy, mobility factor etc. The average MF of the nodes should be greater than CH’s mobility factor. In order to do data aggregation efficiently, a multi-hop routing protocol is implemented by constructing a routing tree. From the selected CHs, the closest sensor node having high unused (balanced) energy and nearer to the BS is taken as best next hop. CH will receive a status message from BS which includes the next hop, the number of frames and, the sensing range. CH broadcast an advertisement with all its information. Depending on Link Connection Time (LCT), remaining energy, and communication cost, the node selects the best CH by sending a join_request_msg.

To tackle the situation of CH failure due to energy depletion, link break down or physical damage, a vice-CH is selected. A node with high remaining energy and low mobility is chosen as the Vice-CH. The data transmission phase starts by sending a request by the CH to its member nodes within its transmission range. On receiving this, nodes check their LCT value and determine its status as “LEAVE” or “REMAIN”. If LCT >frame time \((T_{fr})\), send data and status as “REMAIN”. Otherwise, send status as “LEAVE”. CH aggregates data received from all the CMs and forward as a single packet to BS. CH is assumed to be failed if it is not receiving data from its member nodes for a predefined time period. In this situation, CH will send a notice_msg to vice-CH. The vice-CH waits for the fault-tolerant period and sends an advertisement message with its ID to inform the member nodes about this change. The protocol simulation is done in two scenarios: with small and large number of nodes. It shows that ARBIC protocol outperforms the compared protocols (LEACH-M, CBR-Mobile, CIDT, VELCT, and MBC) in terms of average energy consumption, PDR, and end-to-end delay.

3.2.10 ECP-SC-MCL [2018]

A Graph clustering Paradigm was proposed by Mezghani and Abdellaoui (2018), where classification is based on Stochastic matrix and Markov Chain Clustering(MCL). Initially, few nodes are selected as anchor nodes whose positions are known in advance and they will run the MCL algorithm. In this MCL procedure, an adjacency matrix, S is created by a random walk on the graph. The stochastic matrix, P, is calculated as mentioned in Eq. 2.

$$\begin{aligned} P = Inv (D) \, (S), \end{aligned}$$
(2)

where D is the diagonal matrix whose \(j{\text {th}}\) diagonal element is equal to the sum of \(i{\text {th}}\) row of S. \(P_{ij}\) shows the communication probability between node i and j. Perform an expansion operation, \(P^{n}\), which gives the probability of random walk from node i to node j in ’n’ steps (n hops). Next is the inflation operation that modifies the values within the columns of the stochastic matrix as shown in Eq. 3 (Szilágyi et al. 2010).

$$\begin{aligned} P_{ij}^{(m+1)} = (P_{ij}^{(m)})^r \,\, \left( \sum _{k=1}^{N}{(P_{kj}^{(m)})^r}\right) {^{-1}} \end{aligned}$$
(3)

Here ‘r’ represents the inflation rate (r = 2 for this protocol) and N is the total nodes in the network. Repeat Expansion and Inflation operations until the convergence state is reached. Clustering is carried out by identifying equivalent classes (columns having similar values forms a cluster). After identifying cluster members, a weight value for each member node is calculated based on its residual energy, degree, and proximity with anchor node. CH is a node with higher weight. This should be done periodically to handle mobility of sensors. The data sensed by nodes is collected and aggregated by CH. It is then forwarded to nearest anchor node in its range. Anchor nodes forward data to the sink via direct or indirect (multi-hop) communication. Compared with existing algorithms LEACH-M (Kumar et al. 2008) and E-MBC (Sahi and Khandnor 2016), the proposed method reduces energy consumption and packet loss rate.

3.2.11 CRPD [2018]

Wang et al. (2018) proposed a novel Clustering Routing Protocol for Dynamic WSNs (CRPD). Initially, ‘n’ identical sensor nodes are randomly deployed. These nodes are having unique ID and can move after deployment. Each node can find its position (using GPS) and residual energy. The algorithm works with four phases namely, neighbor node discovery, cluster head selection and cluster formation, data aggregation and route construction, and re-clustering and re-routing. In the neighbor discovery phase, every sensor node broadcasts a message with its unique ID, position, and remaining energy. The nodes that are with in the communication range make use of this message to update their neighbor information table.

In the cluster formation phase, the node with maximum number of neighbors (high degree) having balance energy higher than the threshold (40% of initial energy) is selected as CH. CH will broadcast i_am_chead message. If the node, say i, is with high degree but its residual energy is less than threshold, then another node among its neighbors having the highest degree and residual energy greater than the threshold will be chosen as candidate CH. The node ‘i’ is marked as member node and it will send you_are_chead to all candidate CHs and ordinary message to all other neighbors. If a candidate CH receives you_are_chead message and not received i_am_chead message, it is selected as CH.

In the third phase, CH will send a neighbor request packet as a detect message. The member nodes will send the sensed data to the cluster heads and CH will respond with acknowledgement. The fused data is forwarded to the destination node known as sink and the reply will be send as ACK. This handshaking method will help to improve the reliability of the system. Frequent topology change may happen in MWSN when a new node enters or moves from one cluster to another or when a node dies due to energy depletion. In these situations, the re-clustering and re-routing phase is so important.

3.3 Summary

This section gives a survey on state-of-the-art clustering algorithms in MWSN. The algorithms are classified based on two CH selection approaches namely probabilistic and deterministic. Data aggregation and path selection in these algorithms are done in a distributed manner. The key parameters and characteristics of these algorithms are analyzed in Sect. 5.

4 Measures for quantitative analysis of routing algorithms

The performance of routing algorithms can be quantitatively evaluated using various metrics. The commonly used performance evaluation metrics in MWSN are listed below.

4.1 Performance evaluation metrics

The performance measures such as packet delivery ratio, energy consumption, delay, network lifetime, route acquisition delay, and routing overhaed are defined for MWSN routing algorithms (Cakici et al. 2014; Khan et al. 2012; Anitha and Kamalakkannan 2013; Abuarqoub et al. 2017).

4.1.1 Packet delivery ratio (PDR)

It is the ability of the protocol to deliver packets to the destination (BS). It is defined as the ratio between the received packets by the destination and the generated packets by the source.

$$\begin{aligned} PDR = \frac{Packets\, successfully\, delivered}{Total\, packets\, sent} \times 100 \%. \end{aligned}$$
(4)

4.1.2 Energy consumption (EC)

Energy depletion in nodes may happen due to following reasons:

  • For transmitting and receiving (depends on distance, size of data etc.)

  • Overhead for cluster formation, CH selection, schedule advertisement.

  • Idle/listening state waiting time.

$$\begin{aligned} EC = \frac{Total\, Energy\, Used }{Initial\, Energy}. \end{aligned}$$
(5)

Average energy consumption (AEC): It is the average energy consumed by all the nodes when packet transmission is completed.

$$\begin{aligned} AEC = \frac{\sum _{i=1}^{n} (IE_i - RE_i)}{n}, \end{aligned}$$
(6)

where,

$$\begin{aligned} IE_i&= Initial\, Energy\, of\, sensor\, node\, i.\\ RE_i&= Remaining\, Energy\, of\, the\, node\, i\,\\ n&= Total\, number\, of\, nodes. \end{aligned}$$

4.1.3 Packet delivery delay

It is the difference in time between the packet received at the destination and packet generated at source.

$$\begin{aligned} Total\, delay = Transmission\, delay + Queuing\, Delay \end{aligned}$$
(7)

End-to-End Delay (EED): When a packet is generated for transmission, the difference in time between first bit of a packet is generated at the source and the last bit of the same packet is received at the destination.

$$\begin{aligned} Average\, EED = \frac{\sum _{i=1}^{n} (T{r}_i -T{g}_i)}{n}, \end{aligned}$$
(8)

where,

$$\begin{aligned} T{r}_i&= Time\, at\, which\, a\, packet\, i\, received\, at\, the \, BS\\ T{g}_i&= Time\, at\, which\, the\, i^{th}\, packet\,is\, generated\\ n&= Total\, number\, of\, packets. \end{aligned}$$

4.1.4 Network lifetime

It is the time duration before the energy level of certain number (pre-defined threshold) of total node population (N is the total number of nodes deployed initially) in the network becomes zero. The performance indices used are

  • First Node Dies (FND): Time at which the first node exhausts its energy.

    $$\begin{aligned} FND = {T(E^1_{Th})} - {T_D}, \end{aligned}$$
    (9)

    where,

    $$\begin{aligned} T(E^1_{Th})&= {Time\, when\, energy\, of\,one\, node \,becomes}\\&{\,less\,than\, Threshold} \\ T_D&={Time\, of\, Deployment}. \end{aligned}$$
  • Half of the Nodes Alive (HNA): Time when the half of the nodes exhausts its energy.

    $$\begin{aligned} HNA = {T(E^{\frac{N}{2}}_{Th})} - {T_D}, \end{aligned}$$
    (10)

    where,

    $$\begin{aligned} T(E^{\frac{N}{2}}_{Th})&= {Time\, when\, energy\, of\,N/2\, nodes\,becomes}\\&{\,less\,than\, Threshold}\\ T_D&={Time\, of\, Deployment}. \end{aligned}$$
  • Last Node Dies (LND): Time when the last node’s energy exhausted.

    $$\begin{aligned} LND = {T(E^N_{Th})} - {T_D}, \end{aligned}$$
    (11)

    where,

    $$\begin{aligned} T(E^N_{Th})&= {Time\, when\, energy\, of\,all\, N\, nodes\,becomes}\\ {}&{\,less\,than\, Threshold}\\ T_D&={Time\, of\, Deployment}. \end{aligned}$$

4.1.5 Routing overhead

It is the ratio of the total routing (control) packets generated to the total packets (data and routing) transmitted.

$$ \begin{aligned} Routing\, Overhead = \frac{Number\, of\, control\, packets}{Number\,of\,data\, \& \,control\,packets}. \end{aligned}$$
(12)

4.2 Summary

The various performance metrics used to analyze distributed cluster based routing algorithms with respect to MWSN are defined in this section. These metrics’ are evaluated in Table 3, Sect. 5.

5 Comparison

This section gives a comparison of distributed cluster based routing protocols in MWSN considered for the survey. The CH selection process, the algorithm characteristics and its limitations are explained in Table 2. In Sect. 5.2, the evaluation metrics are summarized. Table 4 is the summary of simulation parameters used for experimentation of the algorithms.

5.1 Comparison of CH selection process, results, and limitations of the algorithms

Table 2 gives the description of CH selection and cluster communication processes used in the papers surveyed. Choosing the parameters for CH selection is very important as it plays a major role in clustering process. The results when compared with the existing works and its limitations are also listed here.

Table 2 Comparison of algorithms with observed results and limitations

5.2 Performance metrics comparison of algorithms

This section gives a view on the quantitative measures used for evaluating the performance of the algorithms described in Sect. 3.2. Evaluation is summarized in Table 3.

Table 3 Performance metrics considered for existing works

5.3 Simulation parameters used

The parameters used and the assumptions made during simulation of these protocols are listed in Table 4. All the methods listed below uses Random Way Point Model as its mobility model. The simulation of each protocol is done in different network scenarios. The size of data and control packet varies for each protocol. The maximum velocity of mobile nodes used for simulation is 25 m/s so that the protocols are suitable for vehicular network applications.

Table 4 Parameters used for simulation

5.4 Summary

From Table 2, it is observed that, the intra cluster communication is 1-hop but inter cluster communication is assumed to be multihop, since transmission range of nodes is short and it cannot directly communicate with BS. The cluster size is unequal and number of clusters is variable in all cases. The cluster formation mechanism and optimal cluster head selection improves the network lifetime, PDR, and energy efficiency. Performance of most of the algorithms are evaluated based on PDR, lifetime, and Energy consumption as it is show in Table 3. EED, routing overhead, bandwidth efficiency etc. are not considered for analyzing the performance. Table 4 gives an overview of tools used for simulation and various parameters considered at the time of implementation.

6 Research challenges, remedies and future directions

6.1 Data collection and routing challenges in MWSN

6.1.1 Cluster formation in dynamic environment

The selection of clusters should minimize the overlapping between them and at the same time it must ensure coverage and connectivity among the nodes. The parameters like number of nodes, deployment area, position of the BS, speed of the mobile device, RSS etc. need to be considered before cluster formation.

6.1.2 Data collection delay

In time constraint applications like health and environment monitoring, the use of WSN and IoT enabled services requires fast delivery of data. So minimizing data collection delay is one of the mojor challenges in MWSN.

6.1.3 Communication overhead

The inter-cluster as well as intra cluster communication should be minimized to improve the lifetime of the network.

6.1.4 Load balancing

As the nodes are moving around, data rate may not be uniform in all the clusters. This will result in overloading at some nodes and they may die so quickly.

6.1.5 Fault tolerance

The nodes in MWSN may fail due to energy depletion or link failure. Also, the relay nodes nearer to the BS may die quickly, that leads to hotsopt problem in the network.

6.2 Solutions to overcome the challenges

  • In order to overcome the issues related to frequent topology changes in MWSN, dynamic re-clustering can be done at some specified intervals.

  • Instead of single-hop, multihop communication improves the data collection delay and packet loss rate.

  • Along with CH selection, a Vice-CH can also be selected to avoid the CH failure problem due to energy depletion or moving out of region.

  • Use of resourceful gateway nodes or relay nodes can be used in place of CH for improving data delivery rate.

  • Meta-heuristic algorithms with efficient fitness function can be developed for identifying CHs and optimal path for data collection.

  • Path re-construction policies can be framed depending on the position, speed and residual energy of the mobile nodes. Network lifetime can also be improved in this way.

  • Algorithms can be framed to generate sub-optimal paths to the BS. This sub-optimal paths (El Fissaoui et al. 2019) will be useful when the optimal path is broken. This will help to reduce packet loss and improves PDR.

  • By reducing the number of control packets, routing overhead can be minimized. It improves the PDR and AEC in the network and thereby enhancing lifetime.

  • In order to mitigate hotsopt problem, the concept of Mobile Sink (MS) can be introduced in MWSN (Kumar and Kumar 2019). Since MS is resource full, it can move in a path to get the collected information from CHs.

  • Use of multiple MS improves PDR and End to End delay during data collection.

  • The data collection points of MS can be identified as Rendezvous Points (RPs) and heuristic approaches for RP selection can be developed. One such approach is mentioned in El Fissaoui et al. (2019) for WSN.

6.3 Future directions

As a future work in this line of research, the following issues can be focused.

  • A hybrid approach to find optimum number of clusters and to choose the best CH evenly from a dynamic network with the help of meta-heuristic approaches (Zou and Qian 2019).

  • A multihop communication model with an efficient fault-tolerant mechanism to deal with CH failures and hotsopt problem that enhances the network lifetime.

  • Study the applicability of multiple MS for data collection from MWSN and develop a strategy to eliminate data duplication at sink.

  • Use Nature inspired algorithms like Particle Swarm Optimization (PSO), Firefly, Emperor Penguin Optimization (EPO) and Gravitational Search Algorithm (GSA) to select RPs and to find an optimal travel path for MS in WSN (Singh and Kumar 2020) that ensures full connectivity.

This section summarizes some research challenges, its causes and some ways to deal with these issues in future.

7 Conclusion

Recently, many survey papers on routing in WSN/MWSN are available. This paper focuses on distributed cluster based routing algorithms for MWSN in recent years. The algorithms are classified and analyzed based on the two CH selection approaches, viz probabilistic and deterministic. From our analysis, it is observed that the grouping of nodes into clusters solves the scalability issue in MWSN. But dynamic clustering and frequent route update is needed in mobility based applications. The re-clustering and re-routing become very complex in such situations. Some protocols tries to improve the reliability by providing acknowledgment and fault-tolerant mechanisms, but this will cause an increase in the control overhead.

It is clear from the study that, none of the protocols analyzed all the performance metrics’. All these protocols described above use a distributed approach for forming groups since centralized method is not appropriate when number of nodes is too large. But this will lead to uneven distribution of CH in the entire network and may also increase the communication overhead. There are lot of opportunities for enhancement and the future scope in this area is explained in detail. So developing a cluster based routing protocol for MWSN is application dependent and specific to user’s requirements.