1 Introduction

A wide range of applications uses wireless sensor networks. A number of applications have become possible in Wireless Sensor Networks (WSNs) due to advances in wireless and sensor technologies [1]. These include battlefield surveillance, environment monitoring, healthcare, agriculture, and home automation. The industrial and research community has shown great interest in wireless sensor networks. There is a rapid growth of WSNs because of their small size and low cost, plus the fact that each device has a processor, sensing, and communication capabilities on a single integrated chip that uses a low-power energy supply. Since the deployment of the wireless sensor nodes in WSN is usually in hostile or harsh conditions, there is no pre-configured infrastructure. The restoration of connectivity and coverage in WSN is therefore a primary objective.

Multiple functional segments of sensor nodes are rendered inaccessible when sensor nodes fail, partitioning the coverage area into disjoint segments. Nodes that fail in a WSN disrupt the network's connectivity and cause important sensed data to be lost. WSNs suffer from large-scale node damage, which results in many disjoint segments, and information cannot be routed from source sensor nodes to the base station. In order to maintain the network and observe and route critical data to remote centers, the network's rapid connectivity restoration is essential. One solution is deploying additional sensor nodes. However, it is often out of question particularly in environments where human intervention is limited. Therefore, it is very important that the connectivity restoration mechanism should be able to work in a self-organized way by making use of the existing sensor nodes.

During the operation of the WSN, connectivity restoration is critical. In a WSN, all the nodes sense and process important information. It is important to conserve the energy of sensor nodes so that all the nodes can function over an extended period of time. Energy efficiency has been the major focus of researchers and clustering plays a significant role in achieving energy efficiency and scalability in WSNs [2]. In clustering, the sensor nodes in a WSN are grouped in such a fashion that the resultant topology is a hierarchical topology. Each group is headed by a resourceful node called cluster head. All the other nodes from the group send their sensed information cluster head. After receiving information from other nodes from the group, the cluster head performs aggregation of data, elimination of redundant data, compression of data, and then forwarding it to the sink node. During the course of operation, there is a possibility that the cluster head may fail due to depletion of battery or an environmental event [2]. This leads to the re-selection of the cluster head.

In this paper, we introduce a connectivity restoration mechanism based on the concept of clustering. The proposed technique is called Connectivity Restoration by Clustering (CRC). As connectivity restoration is based on clustering; therefore, it inherits all the pros associated with clustering. For achieving connectivity restoration, CRC moves a minimal number of nodes as compared to other existing techniques. Our approach utilizes a recovery mechanism called Wireless Broadcast Advantage to deal with one of the well-known cons of clustering known as disruption in inter-cluster communication. CRC uses a distributed cluster-based approach to identify the failed nodes, and for the restoration of connectivity, cluster heads play a significant role. By doing extensive simulation, we prove that CRC outperforms most of the existing state-of-the-art approaches in multiple performance metrics. The main advantages of CRC can be summarized as follows:

  • CRC is based on clustering therefore it brings the benefits of clustering like scalability, energy efficiency, less number of control messages.

  • CRC does not rely on the excessive movement of all neighbors for connectivity restoration.

  • CRC achieves energy efficiency by transmitting a minimal number of control packets.

  • It has a recovery mechanism to deal with failure during inter-cluster communications.

  • Due to minimal movement of nodes for connectivity restoration, CRC results in a minimal reduction in field coverage.

The rest of the paper is organized as follows. In Sect. 2, we elaborate on the literature review by explaining the most relevant related work associated with the considered problem. Section 3 presents the research method, while Sect. 4 illustrates the results and analysis. The paper is concluded in Sect. 5.

2 Literature Review

The sensor field research started in the 1950s as the Sound Surveillance System (SOSUS) was the first sensor network to be introduced [1]. The research community has contributed to connectivity restoration a significant amount of work by considering multiple constraints that affect the connectivity restoration process in WSN. The primary focus of all the techniques designed so far for connectivity restoration is node failure detection and recovery. The component or node failure is detected by failure detection, and it falls into single or collaborative diagnoses depending on the number of nodes that take part in the failure detection process.

In [2], the authors proposed an effective energy-aware approach for fault detection in mobile wireless sensor networks. Different classification methods for fault detection were investigated including SVM, fuzzy SVM, MB-FLEACH, and FCS-MBFLEACH. After extensive simulations, the authors concluded that FCS-MBFLEACH method performs better as compared to all other considered methods.

The mechanism presented in [3] is aimed at the detection of failed nodes in cluster-based WSNs. A cluster head is selected based on the node's residual energy. After the selection of cluster heads, normal nodes associate themselves with exactly one cluster and periodically send information to the respective cluster head. The absence of receiving information from a node means that the node is dead. It is now the responsibility of the cluster head to move in order to restore connectivity.

In [4], the authors proposed a distributed fault detection strategy for wireless sensor networks based on Trend correlation. In this strategy, faulty sensor nodes are detected by analyzing the trend correlation and the median value of neighboring nodes. In addition, the neighboring sensors’ historical data and a self-starting fault detection mechanism are used to improve detection accuracy and to reduce response speed, respectively.

In [5], the authors proposed a centralized two-phase approach for fault detection and recovery in wireless sensor networks, which can recognize different types of faults. In the first phase, faulty data are detected and recovered using the Kalman filter. In the second phase, the types of faults are recognized and reported to the end-user for taking preventive measures. Experimental results revealed the effectiveness of this approach in terms of detection accuracy and data reliability. In [6], a distributed fault detection method for wireless sensor networks considers the importance of nodes in maintaining network connectivity. Sensor nodes are ranked and high local importance nodes are identified and kept in the network as much as possible. To achieve high detection accuracy and low false alarm rate, the probabilistic and deterministic status of the sensor nodes are estimated by comparing their data with the data of their neighboring nodes.

There are two possible ways to move ahead with the connectivity restoration. The first is to select only one neighbor of the failed node that takes part in the recovery process. This approach is followed by PADRA [5]. Another approach to perform connectivity restoration is to allow all the neighbors to take part in the connectivity restoration process. RIM [7] and C3R [8] follow this approach. DARA [9] and PADRA [5] make the decision for the connectivity restoration based on the two-hop information. RIM, DARA, and PADRA rely on cascaded relocation for recovery which results in excessive movement of nodes as all the neighbors of the failed node are involved in the recovery process. The model presented in [10] adopts various approaches like centralized and distributed by introducing mobile robots. A different approach involves three routing protocols that use the fact that different transmission ranges are possible that can prolong network lifetime is proposed in [11]. They use PEGASIS, LEACH, and VGA protocols.

In [12], two novel algorithms based on two and three vertex-disjoint paths respectively are presented. The Connectivity Restoration with Assured Fault Tolerance (CRAFT) algorithm proposed in [13] is designed in such a way that it forms the Backbone Polygon (BP) surrounding the center of the partitioned network area. Relay Nodes (RNs) play a vital role in enhancing connectivity and coverage in an area of interest at a very low cost. RNs use two non-overlapping paths that connect each outer partition to the BP and ensure the connectivity restoration of the network.

In [14], the authors have proposed a novel connectivity restoration technique, namely "Intelligent On-Demand Connectivity Restoration for Wireless Sensor Networks (IDCRWSN)". IDCRWSN is based on a centralized mechanism. Special nodes in the network called the caretaker nodes are responsible for the detection and resolving connectivity restoration. It is proven by the extensive simulations that IDCRWSN efficiently utilizes the residual energy and partial transmission range of the sensor nodes in an integrated manner to restore the network connectivity.

Efficient Solution for Connectivity Restoration (ESCR) [15] makes all the decisions regarding connectivity restoration based on the residual energy of the neighboring nodes. A neighboring node having the most residual energy takes part in the connectivity restoration by moving towards the failed node. As the connectivity restoration process requires a lesser number of nodes to be moved towards the failed node, therefore, ESCR proves to be energy efficient.

In [16], the authors have proposed an energy-efficient connectivity restoration technique, namely "Distributed Energy Efficient Node Relocation (DEENR)". It consumes less energy during the mobility of sensor nodes. In this technique no communication and mobility model is considered for the performance evaluation therefore it is capable of restoring connectivity in only static networks while we consider mobile networks.

3 Research Method

3.1 Problem Formulation

The major objective of this research is to come up with a novel connectivity restoration mechanism based on the concept of clustering. The proposed mechanism should overcome the shortcomings of the existing techniques. The following are the key assumptions used for the design and implementation of our proposed protocol.

  • The total number of N sensor nodes with a uniform distribution is deployed in a given area with dimensions X × Y m2.

  • It is assumed that there are two types of Sensor nodes in the network. Cluster Heads (CHs) and ordinary sensor nodes.

  • All the nodes in the network have the mobility capability.

  • Identification of failed nodes is the responsibility of Cluster heads. This is achieved by the periodic exchange of messages between cluster heads and normal nodes.

  • The energy consumed for sending or receiving a message is given by [17].

3.2 System Model

At time t0, nodes are deployed in the sensing region such that the distribution remains uniform. The transmission range of all nodes is assumed to be Rc. As our solution is based on the concept of clustering, therefore, the first and foremost step is the selection of cluster heads among all the nodes in the network. The criterion that is used for the selection of the cluster heads is based on the residual energy of the sensor nodes. Each node transmits periodic hello messages to all the neighboring nodes. By using these hello messages, each node comes to know about its neighbors. For the selection of cluster head, we have taken motivation from one of our previous works in [18]. Each node in the network sets a timer based on its residual energy. The timer for a node having more residual energy is set to expire before the other nodes and in this way it sends a clusterhead_broadcast message to all the neighboring nodes. A node receiving a clusterhead_broadcast message associates itself with that cluster head. Once the cluster heads are selected (as illustrated in Fig. 1), then as the next step, the ordinary nodes start to send information to the cluster head. Each cluster head has a dual responsibility. Firstly, it collects all the sensed information from the ordinary nodes associated with it. Secondly, it is the responsibility of the cluster head to detect node failures. Node failure can be detected by the absence of periodic hello messages from a neighboring node. Once a failed node is detected in the vicinity, then the cluster head uses NFRecovery algorithm for connectivity restoration.

Fig. 1
figure 1

Formation of clusters and inter-cluster communication

Clustering can prove to be an effective approach [12] that can further be utilized to detect node failures in the network. Each cluster head contains all the neighbors' information like their location, their ID, their energy levels, etc. This information is updated by using periodic Hello messages. Each cluster head keeps track of the energy levels for the neighboring nodes. A threshold is defined, called Et. If there is an absence of a message from a neighboring node Ni, the cluster head checks the Et(Ni) of that node during the previous exchange of the message. If Et(Ni) is found to be below the threshold then it is assumed that the node Ni has failed.

3.3 A Cluster-Based Node Relocation for Connectivity Restoration (CRC)

In this section, we explain the proposed technique called Cluster-based node Relocation for Connectivity restoration (CRC). As discussed in the previous section, the cluster head detects the failed nodes within its vicinity. Once a failed node is detected, the cluster head decides which nodes need to move close to the failed node for the restoration of connectivity. For doing so, the cluster head calculates the sensing area that will be affected (Ae) due to the failure of the node. Ae can be illustrated from Fig. 2 as an intersection of the three sensing ranges of nodes C, D, and F. For CRC, we define a threshold called At, based on which the movement of the neighboring nodes towards the failed node is decided. If Ae becomes equal to or greater than the threshold At, then the cluster head makes a decision regarding the movement of the nodes. This decision is based on distances d1 and d2 as shown in Fig. 2. As the major objective of CRC is to restore connectivity by the minimal movement of the neighboring nodes, therefore CRC selects the node that is closer to the failed node. The cluster head calculates the effective distance called Deffec required for the neighbor to move towards the failed node.

Fig. 2
figure 2

Detection of failed node and node relocation within a cluster

One challenge that arises when clustering is used is the loss of connectivity between the two cluster heads during inter-cluster communication. It is an assumption that all the nodes within the network have mobility capabilities therefore there is a possibility that the two cluster heads move away from the communication range of each other. For dealing with the loss of connectivity during inter-cluster communication, CRC utilizes a mechanism known as Wireless Broadcast Advantage (WBA) [19]. The phenomenon is that when a wireless message is broadcasted, it is not only received by the receiver but also by the intermediate nodes that are present between the source and destination. So during the cluster head selection, there are many nodes that receive announcements from multiple cluster heads. These nodes are referred to as guard nodes. In Fig. 1, nodes C, F, and K (illustrated in red color) receive announcements from two different cluster heads. C, F`, and K are the guard nodes and play an important part if the two cluster heads go out of each other's communication range. When one cluster head sends an inter-cluster communication message towards the other cluster head, nodes C, F, and K receive this message as well. These nodes wait for an acknowledgment from the other cluster head. The absence of an acknowledgment message means connectivity disruption between the two cluster heads. The selection of the guard node responsible for relaying the information is based on a timer. These guard nodes set a timer based on the residual energy such that the timer of a node having more energy level expires first and retransmits the inter-cluster message towards the destination cluster head. This technique acts as a recovery mechanism in case of connectivity disruption among the cluster heads.

figure i

The computational Complexity of the above Algorithm depends upon the input size of set N. As there is only one loop at step 1 which iterates ‘n’ times the number of nodes so the algorithm takes linear time complexity hence:

$${\text{T}}\left( {\text{n}} \right) \, = {\text{ O }}(n_{i} )$$

However, taking each step into consideration we may also calculate the time taken by the conditional statements and other three calculations for the values \({{\varvec{E}}}_{{\varvec{r}}{\varvec{e}}{\varvec{m}}},\boldsymbol{ }{{\varvec{A}}}_{{\varvec{t}}}\), and \({{\varvec{C}}}_{{\varvec{g}}{\varvec{a}}{\varvec{i}}{\varvec{n}}}\) respectively, T(n) still remains linear time complexity of Big O of input ‘n’.

4 Results and Analysis

We used the OMNeT +  + simulator's INET framework [20] to compare the proposed protocol with other baseline protocols. Three existing techniques have been compared to the proposed protocol, namely C3R [8], VCR [21], and ESCR [15]. As for performance metrics, the average number of moved nodes, distance traveled to restore connectivity, and the total number of exchanged packets were compared. Table 1 provides details of the simulation parameters.

Table 1 Simulation parameters

4.1 Number of Nodes Moved

The metric, number of nodes moved is a good indication of the performance of a connectivity restoration protocol. The lesser the number of nodes a protocol moves for connectivity restoration, the better it is in terms of energy efficiency and reduction in field coverage. Based on the average number of nodes moved during connectivity restoration, Fig. 3 shows the performance of all the considered protocols. C3R, a protocol that heavily relies on cascaded relocation yields the highest number of nodes moved among all the considered protocols. Similarly, due to excessive movement of nodes for connectivity restoration, VCR results in a greater number of nodes being moved. As compared to both C3R and VCR, ESCR performs better because it does not rely on cascaded relocation and as a result moves lesser nodes for the connectivity restoration. In comparison with the considered protocols, our proposed protocol performs better by moving lesser nodes for connectivity restoration. One interesting result illustrated by Fig. 3 is that for all the considered protocols, as the number of nodes in the network increases, the resultant number of nodes required for connectivity restoration also increases. However, our proposed protocol does not follow this trend. For the proposed protocol, as the number of nodes increases, the average number of nodes moved required for connectivity restoration decreases. The major reason behind this phenomenon is clustering. As the number of nodes in the network increases, the number of nodes in each cluster also increases. As the cluster head decides the movement of nodes towards the failed node and this decision also considers the reduction in the sensing area; therefore, the presence of more nodes leads to a more optimized decision by the cluster head, hence, moving a lesser number of nodes.

Fig. 3
figure 3

The average number of nodes moved in VCR, C3R, and ESCR

4.2 Reduction in Field Coverage

As the nodes are moved for connectivity restoration, the coverage of the sensing area is affected. The percentage reduction in field coverage with respect to different communication ranges is presented in Fig. 4. It was observed during the simulations that as the communication range increases, the field coverage reduction decreases for all the considered protocols. The two protocols that suffer the most in terms of reduction in field coverage are C3R and VCR. Excessive reliance on the cascaded relocation of nodes is the major reason behind this observation. Excessive mobility of nodes causes holes in the coverage of the sensor nodes resulting in a drastic reduction in the field coverage. One interesting comparison that can be made from Fig. 4 is of ESCR and our proposed protocol CRC. Unlike our proposed mechanism, ESCR is a distributed connectivity restoration protocol and is not based on clustering. ESCR moves lesser nodes as compared to C3R and VCR, however, as compared to our proposed protocol, it moves more nodes (as shown in Fig. 3). As a result, the percentage field reduction for our proposed protocol is better as compared to all the considered baseline protocols. The improved scalability by utilizing the clustering mechanism for our proposed technique CRC is evident from Fig. 4. As the communication range of nodes in a cluster increases, the cluster head is able to make better decisions regarding the movement of the minimal number of nodes for connectivity restoration. Therefore, a lesser number of nodes need to be moved and the percentage reduction in field coverage is minimal for CRC.

Fig. 4
figure 4

Percentage reduction in field coverage

4.3 Average Distance Moved During Relocation

A graph of the average distance traveled by all the nodes in the network over the simulation period is shown in Fig. 5. Figure 5 illustrates that for all the baseline protocols, there has been an increase in average distances traveled. However, for our proposed protocol CRC, the total distance moved during relocation decreases as the number of nodes in the network increases. Clustering has this effect by nature. Clustering performs very well and is considered highly scalable. Thus, the probability of finding a suitable neighbor close to a failed node increases as the number of nodes increases in the network. So, fewer nodes have to be relocated in order to restore connectivity. In addition, CRC moves the minimum number of nodes required for relocation and does not rely on cascaded relocation, which substantially reduces the distance traveled during relocation. The figure below shows three different variations of the CRC with At equal to 10%, 20%, and 30%. Another interesting observation from the figure is that At affects the average distance traveled for our proposed protocol CRC. Increasing At, results in a decrease in average distance moved.

Fig. 5
figure 5

Nodes vs. distance moved

4.4 Total Number of Exchanged Packets

Comparing all baseline techniques, CRC exchanges the fewest packets. It is known that clustering reduces the number of transmitted packets and maximizes energy efficiency. Moreover, Fig. 6 shows that the proposed technique is scalable. In all baseline protocols, the total number of packets transmitted increases as the number of nodes increases. CRC, the technique we propose, results in the fewest packets transmitted. In order to achieve this, clustering is used to reduce the number of control packets. Additionally, avoiding the excessive movement of neighboring nodes of a failed node also results in fewer packets being transmitted.

Fig. 6
figure 6

Total number of exchanged packets

5 Conclusion

In this paper, a clustering-based connectivity restoration mechanism called Connectivity Restoration by Clustering (CRC) was proposed. CRC efficiently utilizes clustering and employs cluster heads for the detection and resolving of the connectivity disruption due to failed nodes. Moreover, CRC also utilizes a recovery mechanism for resolving connectivity disruption during inter-cluster communication. The effectiveness of CRC was proven by comparing it with three other baseline techniques called VCR, C3R, and ESCR. Extensive simulations proved that CRC outperforms all the considered baseline techniques in terms of multiple performance metrics.