1 Introduction

Since their emergence, wireless networks have seen a growing success in the scientific and industrial communities. Owing to its various advantages, this technology has been able to establish itself as a key element in current network architectures. Wireless media offers unique properties, which can be summarized in three points: ease of deployment, ubiquity of information and low-cost installation. During its evolution, the wireless paradigm has seen the emergence of various derived architectures, such as: cellular networks, wireless local area networks and others. During the last decade, a new architecture has emerged: wireless sensor networks. This type of networks results from a fusion of two fields of modern computing: embedded systems and wireless communications. WSNs are usually composed of a huge number of sensor nodes that collaborate with each other to monitor an area of interest in order to ensure specific tasks like forest fire detection, etc. However, WSNs have many constraints that motivate many researchers and reside mainly in the fact that their resources are limited in terms of communication, computations and energy. In particular, the constraint related to the energy is considered as a fundamental problem. Indeed, it is of utmost importance to efficiently manage the energy consumption in order to protract the network lifespan.

Sensor nodes are generally battery powered. However, in most sensor network applications, the nodes are deployed in harsh environments and it is not practical to recharge or replace their batteries. As a result, the overall lifetime of the network depends entirely on sensor nodes’ batteries. To address this issue, extensive works based on low-power radio communication hardware, energy-aware Media Access Control (MAC) protocols, etc. have been done for the sake to save the energy as much as possible. However, energy efficient clustering and routing algorithms [1, 2] are considered as the most two promising fields that have been studied extensively for WSNs.

In cluster-based WSNs, the sensor nodes are organized into distinct clusters. Each cluster is under the management of a master node called CH. In this organization, the nodes of a cluster send their data directly to the corresponding CH. This latter collects data from its cluster member nodes to send them directly or possibly through other CHs to the sink. Clustering a WSN has the following advantages. (1) It ensures data aggregation at CH level which can reduce energy consumption by discarding redundant data. (2) Routing can be managed easily since only special nodes such as CHs need to maintain the local route setup of other CHs and thus need small routing information. Furthermore, this will improve significantly the network scalability. (3) It also conserves the communication bandwidth because sensor nodes communicate only with their respective CH and thus avoiding the exchange of redundant messages between them. However, in the clustering method, a CH bears a heavy workload by receiving data of cluster members and other CHs, data aggregation and data transmission to the sink. Usually, the CHs are chosen among normal sensor nodes which further aggravates the situation as the nodes with extra load can die quickly because of their high-energy consumption. Accordingly, many researchers [3,4,5] have proposed the use of some advanced nodes provisioned with more energy. These nodes are treated as CHs and undertake the same functionality of them. Unfortunately, the introduced nodes are in turn battery operated and hence they are energy-constrained. Therefore, it is extremely important to properly utilize their energy in the process of both clustering and routing. Also, repetitive clustering may result in much energy consumption of sensor nodes. This problem should be tackled efficiently [6] to avoid the early death of the WSN. Additionally, most of the existing protocols ensure non optimized route finding, which disrupts the data routing and may result in frequent route failures as well as these protocols perform fault tolerance by a periodic WSN’s re-clustering, which requires extra communication overhead, thus consuming high amount of energy [7].

In this paper, we address the problem of designing energy efficient clustering and routing algorithms for WSNs which are fault-tolerant by proposing ECRP. In this protocol, not only the cluster setup process is done only once and the CH role is rotated, based on energy, among members until the network stops to function, but also it can adapt the network topology change by handling adding of new nodes and nodes failure issues. Also, sensor nodes which have not any CH in their communication range are attached to a CH using multi-hop mode. For inter-cluster communication, the proposed multi-hop routing method is conceived in such a way that the energy consumption is balanced and minimized. As well, a fault-tolerant mechanism is proposed to cope up with the failure of CHs and relay nodes.

We perform extensive simulations on the proposed protocol and compare its results with those of the recent and relevant protocols namely EA-CRP [8] and BPA-CRP [9]. The simulation results show that our proposed protocol performs better than these protocols with respect to network lifetime, energy efficiency, and packet delivery ratio.

The remainder of this paper is organized as follows. In Sect. 2, we discuss the related works. Section 3 presents the network model along with some terminologies used in the paper. In Sect. 4, we give a detailed presentation of our proposed protocol. The performance evaluation is given in Sect. 5. Finally, Sect. 6 concludes the paper.

2 Related work

Numerous cluster-based routing protocols have been proposed in the scientific literature. One of the prominent clustering protocols is called low energy adaptive clustering hierarchy (LEACH) [10]. The network lifetime of LEACH is divided into rounds, each of which is split into setup and steady-state phases. Unfortunately, the CH selection method in this protocol is random which means that it does not guarantee the selection of the best nodes to act as CHs in each round. Furthermore, LEACH is not suitable for large scale networks as it uses single hop communication mode between CHs and the sink, which may deplete energy of faraway CHs due to the long-haul communications. A centralized version of LEACH called LEACH-C is proposed in [11]. Contrariwise to LEACH method which is distributed, LEACH-C is centralized as the sink is responsible to discover the optimal number of CHs and the best nodes to handle this. Another protocol called Node Ranked-LEACH (NR-LEACH) [12] is proposed to enhance the performance of LEACH protocol. The enhancement aims to distribute the energy load among the sensor nodes in the CH election phase. This is done by using node rank algorithm which depends on the path cost and the number of links between sensor nodes to elect a CH for each cluster. The proposed algorithm overcomes the random process selection adopted in LEACH protocol, which leads to unexpected failure for some CHs. Unluckily, in LEACH-C and NR-LEACH, the sink must have accurate information about nodes such as their location and their residual energy which causes a heavy burden on sensor nodes because of the direct communication mode with the sink. The authors in [13] have proposed a distributed unequal cluster-based routing algorithm called DUCR which deals with the hotspot problem by using unequal clusters formed in such a way that the cluster’s size reduces in the direction of the sink. In these clusters, nodes with high energy are selected as CHs. DUCR proposes also an energy-efficient routing algorithm to balance the relay load among the CHs. However, in the CH selection phase, the authors do not put attention on how coverage of the network is ensured since a normal sensor node may not find any CH within its communication range. In [8], the authors proposed an Efficient and Energy-Aware Clustering and Routing Protocol (EA-CRP) to reduce the energy consumption among sensor nodes. EA-CRP divides the field into a number of layers where the layer is composed of a number of clusters and the layer width decreases in the direction of sink. Similar to LEACH, the network lifetime of EA-CRP is split into rounds where the round time is specified by the sink node. For the sake of energy balancing, double CHs namely leader head and CH are used in a cluster where the leader head is used to collect and aggregate data from cluster members and the CH is used to relay data between layers till reaching the sink. In this protocol, the leader head is chosen based on energy of a node and its location among all other nodes while the CH is selected based on energy and distance of a node to the sink. The authors employ a multi-hop routing protocol that considers remaining energy and closeness of relay node. The major drawback of this protocol lies in the fact that it generates many control messages that absolutely gives rise to decreasing the network lifetime. A distributed clustering and routing algorithm for WSNs is proposed in [14]. The data routing is conceived in such a way that the energy consumption is minimized and balanced among nodes. Here, the authors assume that CHs are fixed beforehand which makes the network lifespan dependent only on that of these nodes and thus the energy consumption is uneven in the whole network. A Prolong-Stable Election Protocol (P-SEP) is presented in [15] and it is a modified SEP that allows to prolong the stable period of Fog-supported WSN by maintaining balanced energy consumption. In P-SEP, the decision to change the CH is probabilistic which offers a good chance that every node is chosen as a CH regardless of its energy. Further, P-SEP establishes one-hop intra- and inter-cluster topology in which each node transmits its data directly to the CH and in turn the latter transmits the aggregated data to the fog node. In the situation where the CH or fog node is faulty, the sensor nodes’ data are prevented to reach its final destination.

The aforementioned protocols select CHs periodically without concentrating on the energy efficient period of CH replacement. For this reason, these protocols unnecessarily select and replace CHs, and as a result it may dissipate the limited power of sensor nodes. Such energy overhead can significantly reduce the network lifetime.

To save more energy by eliminating the overhead which takes place in network setup of every round, a Fixed Low Energy Adaptive Clustering Hierarchy (LEACH-F) [16] is proposed. In this protocol, the sink selects CHs and cluster members once and the network keeps functioning until it dies. For energy balancing purpose, the CH role is rotated across all cluster members. Although this work may extend the network lifespan, still there exist some shortcomings. In fact, the static aspect of the network is restrained since after cluster formation, no node can be added or removed from the network. Moreover, node mobility cannot be treated. Another work [17] is proposed and it is based on the round-robin method. Additional fault-tolerant message in the frame is needed to remove abnormal sensor nodes. However, no nodes can be added to the network.

Another category of works has been done to mitigate frequent CH selection by delaying new round’s coming. In [18], unlike LEACH, re-clustering is not done after completion of each round but it is done once the energy of the elected heads goes below a threshold. The closest nodes to the sink are selected as CHs and the sink finds, based on energy, next heads for each cluster. However, the single hop communication mode between CHs and sink used in this protocol is not energy efficient because faraway CHs may deplete their energy quickly. Thus, this leads to uneven energy balancing problem. Also, there is still the need to re-cluster the WSN which may result in an overhead exchange and accordingly much energy is consumed. The authors in [19] proposed a modification to the CH selection process of LEACH in which an improved formula is used to reduce energy consumption. They introduce vice CH which replaces the main CH in the later period of steady-state phase. This is for the aim to reduce the energy consumption during re-clustering and prolong the time of being in a steady-state phase. Yet, this extends the network lifetime but after a certain time, the vice CH will also consume more energy due to its heavy tasks undertaken as main CH. This calls for new setup process and thus much overhead will again take place. Also the work suffers from direct communication between CHs and the sink which is not adequate for large scale networks as we mentioned in prior. In [9], it is proposed a Balanced Power-Aware Clustering and Routing protocol (BPA-CRP). The network topology in this protocol is divided into equal-sized layers and clusters. A forwarder node resides in each layer and it is capable of relaying the collected data from the CHs within a layer and faraway forwarders towards the sink. The clustering algorithm in BPA-CRP allows the cluster to operate during multiple rounds (batch) without any need for setup overhead. In every round, the forwarder node verifies if its energy dips below a predefined threshold. If it is the case, new setup phase starts in order to select forwarders and form new clusters. However, BPA-CRP is still using random head selection process without specifying any criteria to choose CHs. Although this work alleviates the amount of exchanged overhead there still remain the problem of energy overhead which takes place in the CH replacement phase of every batch’ beginning.

The proposed ECRP has the following advantages over the existing protocols.

  • The existing protocols [8,9,10,11,12,13,14,15, 18, 19] require repetitive clustering which consumes more energy due to the excessive overhead during every cluster setup phase. Whereas, in ECRP the cluster setup is done only once until the network lifetime ending.

  • ECRP selects CHs in each round based on energy in contrast to [9, 10, 15,16,17].

  • The WSN performance is highly affected by how a non-CH node chooses its CH among multiple options. The most protocols [8,9,10,11,12, 15,16,17,18,19] use only a single parameter for CH selection, while in ECRP multiple parameters are used to get an optimal solution.

  • ECRP takes care of the sensor nodes which have no CH within their communication range in contrast to [8,9,10,11,12,13, 15,16,17,18,19] that do not take this issue into consideration.

  • Nodes can be added to the network in ECRP in contrast to [16, 17].

  • None of the existing protocols address the energy-efficient period of the CHs replacement and fault-tolerant routing issues together. In the proposed work, we consider both these issues.

To summarize, there are two categories of protocols. A category of protocols selects CHs periodically, without considering energy-efficient period of the CHs replacement. Hence, unnecessary CHs selection may dissipate limited energy of sensor nodes. Such energy overhead can considerably shorten the WSN lifetime. On the other side, another category of protocols resolves this issue. The first category of protocols neglects the energy-efficient period of CH replacement and it supports network topology change precisely adding/removing sensor nodes to/from the network, while the second category of protocols concentrates on the energy-efficient period of CHs replacement but it neglects the network topology change. For this reason, we propose in this paper an optimal solution called ECRP that resolves the issues of the above mentioned protocols categories. In fact, ECRP supports the network topology change and allows to perform cluster setup only once which results in reducing the overhead. Therefore, all of just mentioned enhancements not only improve the network flexibility but also contribute in accomplishing energy efficiency and balancing goals.

In the following, we introduce the network model along with some terminologies to make better readability of the paper.

3 Network model

We assume that the network is composed of homogenous sensor nodes that have been uniformly deployed over a target area as in [8, 9]. These nodes are assumed to be stationary after their deployment. The sink is also stationary and has no energy constraint. The sensor nodes communicate with each other using a wireless link. They have always data to transmit to their corresponding CH and then each CH aggregates the received data and sends them in a message to a relay node. This process is repeated until reaching the sink. All communications are over wireless links. A wireless link is established between two nodes only if they are within the communication range of each other. All nodes use the same transmitting power, and thus, a node can calculate the distance to another node based on the received signal strength. The same strategy has been used in [20, 21]. Therefore, it does not require any location finding system like global positioning system.

The terminologies used in this paper are described below.

  • \({S} = {S1, S2, \ldots Sn}\): Set of sensor nodes, where Si is a sensor node.

  • \({X} = {X1, X2, \ldots Xn}\): Set of CHs, where Xi is a CH.

  • Backup_CH(Si): Set of candidate CHs of sensor node Si.

  • Forwarding Candidate Set (FCS): Contains the neighboring nodes.

  • \(Dist_{ij}\): Distance between the nodes i and j.

  • \(ResidualEnergy_{Si}\): Residual energy of Si.

4 Protocol description

Here, we discuss the proposed protocol that has the following 4 phases: (1) start phase, (2) clustering phase, (3) data-forwarding phase, (4) and topology change phase.

4.1 Start phase

In this phase, each node determines some relevant information such as its distance from the sink, its neighbors, and distance from its neighbors. This information is saved in the FCS. Initially the sink broadcasts a HELLO message which can cover the entire target area since this node does not have any power constraint. On the other hand, a node that receives the message computes the distance to the sink based on the received signal strength. Afterwards, each node diffuses another HELLO message to its neighbors. In this way, every node can compute its distance to a given neighbor as well as it will have an overview on its neighbors. This information will be used in next phases.

4.2 Clustering phase

To achieve our main goal of minimizing energy consumption in WSNs, a powerful distributed clustering algorithm is proposed in this paper. This algorithm aims at avoiding repetitive clustering. In fact, the great issue from which the most proposed literature protocols suffer is the repetitive overhead exchanged in the beginning of each round or batch especially in setup phase. To handle this issue, we propose to ensure the setup phase only once in the first round and the network continues to function for the rest of rounds until it is out of service. The CH role is rotated based on energy among elements of the cluster. Based on the connectivity property, some sensor nodes may not fall in the communication range of a CH. For this, the proposed protocol takes care of these nodes. More details on clustering phase are given bellow.

We begin by explaining the proposed clustering method. It is interesting to mention that the CHs are selected initially based on the same idea presented in [10] and afterwards the CH selection is based on the residual energy of nodes. To ensure the local decision-making to become CH, each node n chooses a random number between 0 and 1 and if this number is less than a threshold value T(n) defined in Eq. 1, it becomes CH for the current round.

$$\begin{aligned} T(n)={\left\{ \begin{array}{ll} \frac{P}{1-P*\left( rmod \frac{1}{P}\right) }, &{}\quad {\text {if}}\,n \in G.\\ 0, &{}\quad {\text {otherwise}}. \end{array}\right. } \end{aligned}$$
(1)

where P is the desired percentage of CHs (e.g., \({P} = 0.05)\), r is the current round, and G is the set of nodes that have not been CHs in the last \(\frac{1}{P}\) rounds.

Once the CH is nominated, an advertisement message (denoted as ADV_MSG) is broadcasted. This will allow the non-CH nodes to determine their respective CHs. To this end, a receiver node may receive multiple ADV_MSG messages from multiple CHs which are thereupon stored in Backup_CH list. It is known that the performance of WSNs is highly affected by who the non-CH nodes select their CH. Hence, a non-CH node selects the best candidate CH among multiple CHs considering significant parameters to get an optimal solution. This is done according to a cost function which is based on the following principle:

Principle The basic motivation of the cost function is to select the node with more residual energy and shortest distance from the sink and the sender node so that the energy consumption is minimized and balanced. Hence, such a function can sharply increase the cost value when the residual energy is significant and the distance is small. Interestingly, the cost function is conceived such that even the distance and the energy parameters have different metrics, the cost function favors in the selection the node which gives an optimal solution.

Let CHCOST(Si, Xi) be the cost of the CH Xi for the node Si. To define the cost function, we consider the following important parameters.

  • Residual energy of CH: A sensor node should join a higher energy CH in its communication range. It leads to,

    $$\begin{aligned} CHCOST{(Si,Xi)} \propto ResidualEnergy_{Xi} \end{aligned}$$
    (2)
  • Distance from sensor node to CH: A sensor node should join the closest CH in order to minimize the energy consumption since the communication with faraway CH consumes much energy. So, the closest CH has the higher chance to be selected. Therefore,

    $$\begin{aligned} CHCOST{(Si,Xi)} \propto \frac{1}{Dist_{SiXi}} \end{aligned}$$
    (3)
  • Distance from CH to sink: CHs can assure long-haul communication in comparison with ordinary sensor nodes and can communicate directly with the sink. Thus, the faraway CHs consume more energy. Therefore, cluster members of these CHs should be less than that of the CHs which are near the sink. In other words,

    $$\begin{aligned} CHCOST{(Xi,Sink)} \propto \frac{1}{Dist_{XiSink}} \end{aligned}$$
    (4)

    By combining Eqs. 24, we obtain:

    $$\begin{aligned}&CHCOST{(Si,Xi)} \propto \frac{ResidualEnergy_{Xi}}{Dist_{SiXi} * Dist_{XiSink} } \end{aligned}$$
    (5)
    $$\begin{aligned}&CHCOST{(Si,Xi)} = K* \frac{ResidualEnergy_{Xi}}{Dist_{SiXi} * Dist_{XiSink}} \end{aligned}$$
    (6)

    where K is a proportionality constant. Therefore, the cost function depends on the three weight values, i.e., \(ResidualEnergy_{Xi}\), \(Dist_{SiXi}\), and \(Dist_{XiSink}\), the justification of which is already stated. However, because we consider the weight values only for comparison purpose, that’s why the value of K does not hamper the objectives of the clustering i.e., energy efficiency and energy balancing. Thus, without loss of generality, we can assume K = 1. Therefore,

    $$\begin{aligned} CHCOST{(Si,Xi)} = \frac{ResidualEnergy_{Xi}}{Dist_{SiXi} * Dist_{XiSink}} \end{aligned}$$
    (7)

Now, Si selects the CH Xi having the maximum cost value as relay node. Therefore,

$$\begin{aligned} CHCOST(Si, Xi) = MAX \left\{ CHCost(Si, Xk) \right\} \end{aligned}$$
(8)

where Xk is in the communication range of Si.

After CH selection, a non-CH node broadcasts a join-request message (denoted as JOIN_MSG) that consists of the identifier (ID) of the selected CH and node’s residual energy. A node may overhear this message from different sensor nodes belonging to different clusters. Accordingly, it updates its FCS by adding neighbor nodes’ energies and their respective CHs’ IDs. It is noteworthy to mention that it can exist two types of nodes in the network namely connected and unconnected nodes. Connected nodes are those nodes who have at least one CH in their communication range while unconnected nodes do not have any CH in their communication range. A node that has not receive an ADV_MSG message is considered to be unconnected node. On the other hand, this node may receive multiple JOIN_MSG messages from connected nodes. In this case, it chooses the higher energy connected sensor node and informs it about its membership using a JOIN_MSG message that consists of its remaining energy. Moreover, the connected node sends through a JOIN_MSG message the ID and energy of the unconnected sensor node to its respective CH. At this stage, the CH is aware of some relevant information about its members such as nodes’ ID, nodes’ remaining energies. A node list is created in each node. This list contains nodes’ IDs, their remaining energies as well as their order and it is sent to connected nodes and through these ones to unconnected nodes. Note that all the previous control messages are sent only once during the network functioning which undoubtedly results in letting the network function longer. The CH’s role is rotated fairly and based on energy parameter among the CH and its members. To this end, we propose the following method. In few words, at each round every sensor node within a cluster should ensure that it knows the CH for a particular round. This is achieved by giving an order to each node in descendent mode based on the residual energy of nodes. Normally, each node has its FCS, so if a node does not reach a node for which his role has arrived to be CH, it keeps sending its data via its relay node to the CH; otherwise data are sent directly to the CH. After a number of rounds more precisely in the time when the last cluster member holds the CH role, all its cluster members inform it about their remaining energy picky bagged in their data sent. In this way, the CH reorders, based just on the nodes that sent their energy, the node list in descendant mode on the basis of energy. This node list is sent through ACKnowledgment message (denoted as ACK) to the connected members in the last frame and in the same way these nodes in turn send the list to their unconnected members. This allows energetically powerful nodes to become CHs in the following rounds while reducing the control overhead. The same process is repeated until the end of network functioning.

To demonstrate the idea of the proposed clustering algorithm, let us suppose that a cluster has 3 members and their IDs are 1, 15, and 6. The CH node sorts these nodes in descending order based on energy expressed in Joules (J) as illustrated in Table 1 and sends the node list to connected and unconnected members. Each node is then aware of the CHs’ order in its cluster. For example, at round n, node 15 is the CH. At the beginning of round \({n}+1\), node 1 becomes CH and in round \({n}+2\) node 6 becomes CH. Remember that the order in Table 1 can be changed based always on energy parameter when all nodes in the cluster accomplish their role as CHs. Accordingly, in the last round which is in our case round 3, the last node in the list i.e. node 6 reorders the node list based on the received energy of cluster members. Thus, the new order is taken into consideration by the other nodes. This process continues until the network lifetime ending.

Table 1 CH order at round n

Lemma 1

The message complexity of the proposed clustering algorithm is O(m) with m nodes (CH and connected node).

Proof

In the worst case, a node sends in the first round ADV_MSG message and node list if it is a CH; otherwise the unconnected node sends a message to inform the connected node about its membership. In turn, the connected node sends two JOIN_MSG messages to inform the CH about its membership and that of the unconnected node and a message to inform the unconnected node about node list. Hence, in total 6 messages are sent for each n nodes (CH, connected node and unconnected node). The clustering algorithm has the message complexity of O(6n), that is, O(n). Therefore, the worst case total message complexity is O(n).

After every number of rounds, the CH reorders the node list and sends it to connected members through ACK message. In turn connected members send the list to unconnected members. Thus, a sensor node (CH and connected node) sends at most only 1 message. Therefore, in total there is a message complexity of O(2m), that is, O(m). As this case is the most dominant, the overall message complexity of the proposed clustering algorithm is O(m). \(\square \)

Lemma 2

The time complexity of the proposed clustering algorithm isO(m) with m candidate nodes.

Proof

In the first round, each node can compute its probability to become CH independently. So it is executed in O(1) time. During cluster formation, a non-CH node in worst case requires to process n candidate CHs to select its CH, and hence, it executes in O(n) time. After every number of rounds, a CH requires to process m candidate nodes to reorder the node list, and hence it executes in O(m) time. This last case is the most dominant and thus the overall time complexity of the clustering algorithm is O(m). \(\square \)

4.3 Data-forwarding phase

This phase is divided into two main data forwarding modes which are intra-cluster and inter-cluster.

4.3.1 Intra-cluster communication

After the clustering, a CH attributes a time division multiple access schedule to its members in order to send their data in specified time slot for each one and enter the sleep mode otherwise. This allows to avoid (1) collisions between the nodes of the same cluster and (2) the loss of energy due to the overhearing and idle listening. The CH aggregates and forwards the data collected from cluster members to the sink following the inter-cluster communication mode described below.

4.3.2 Inter-cluster communication

We now present the proposed inter-cluster communication method. Here, every node selects its next-hop relay node in such a way that the energy consumption is minimized and balanced. This is implemented as follows. Every CH designates in each round its relay node based on a cost function in order to forward via relay nodes the cluster members’ collected data. At relay nodes’ level, this process continues until reaching the sink. Hence, the main goal of this phase is to find a valid route from CHs to the sink, thus forming a multi-hop path towards the sink. After the clustering phase, each CH, based on its collected information, may have multiple relay nodes that will be CHs and/or cluster members. The relay node’s selection process is done as follows.

  • Step 1 If a sensor node has multiple relay CHs, it selects one of them based on a cost function.

  • Step 2 If no CH exists, the node may have cluster members as relay nodes (gateway nodes) that will allow it to reach a nearby CH. In this case, the selection of a relay node among these nodes is done on the basis of a cost function.

  • Step 3 If there is no gateway node, the CH selects based on a cost function a node from the rest of nodes. This node triggers the same process as in the steps 1 and 2 to find its relay node among CHs or normal sensor nodes. The CHs are favored than normal sensor nodes in the selection process.

It is worth mentioning that the selection of a relay node Sj by Si is done based on a cost function in all the steps above. This cost function considers energy efficiency and balancing parameters like residual energy of Sj, distance from Si to Sj, and distance from Sj to sink. This process is repeated at each relay node/gateway node until reaching the sink. Note that a sensor node that can reach directly the sink is not required to execute the next hop selection algorithm. The cost function denoted as COST is derived similarly as in Eq. 7.

$$\begin{aligned} COST{(Si,Sj)} = \frac{ResidualEnergy_{Sj}}{Dist_{SiSj} * Dist_{SjSink}} \end{aligned}$$
(9)

Now, Si selects the relay node Sj having the maximum cost value as relay node. Therefore,

$$\begin{aligned} COST(Si, Sj) = MAX \left\{ COST(Si, Sk) \right\} \end{aligned}$$
(10)

where Sk is in the communication range of Si and distance from Si to sink is greater than distance from Sk to sink.

In case the selected relay node is an ordinary node, it is informed about its role while if the selected relay node is the CH it does not require to inform it since it radio is on during the whole round. After this process data are sent to the selected relay node. In its turn this node sends the data to its relay node and this is repeated until reaching the sink.

An example of the overall routing method is illustrated in the Fig. 1. Let’s suppose a sensor network composed of 2 relay nodes, 3 CHs and a sink. The routing method consists that X1 forwards its data to X2, in turn X2 routes its data to relay node S1 which sends its data also to S2. The latter forwards its data to X3 in order to send them to the sink.

Lemma 3

The message complexity of the routing algorithm is O(n) with n relay nodes (ordinary nodes).

Proof

After the clustering phase, each node, based on its collected information, selects its relay node. If the latter is normal sensor node it is informed through a control message; otherwise it is not required to inform it. Therefore, in the best case no control message is sent; otherwise one control message is sent which means that the routing algorithm has constant message complexity, i.e., O(1). Therefore, the message complexity of the routing algorithm is O(n). \(\square \)

Lemma 4

The time complexity to find next hop relay node isO(n) with n nodes (CHs and nodes in FCS).

Proof

For data routing, each node requires to calculate a cost value for each other node. For this, in the worst case (Step 3), a node may require to process m nodes in FCS to find a relay node. The time complexity is then O(m). The relay node again is required to process CHs in step 1 and nodes in FCS in step 2 which gives a total of n nodes. Hence, the time complexity is O(n). The worst case overall time complexity is \(O(m)+O(n)\), i.e., O(n) as \(n > m\). \(\square \)

Fig. 1
figure 1

Routing method example

4.4 Topology change phase

In this section, we discuss the case where new nodes are added to the network as well as the case of critical nodes failure.

4.4.1 Adding new sensor nodes

Whenever a new sensor node joins the network i.e. added to the network, it broadcasts a network join-request message (denoted as JOIN) containing its remaining energy and a status set to 1 indicating that it is a new node. The new sensor node requests the network membership until it receives a response. This response indicates the type of responder node if it is CH or normal sensor node. Then, the new added node may receive multiple ACK messages from CHs and/or neighboring nodes. Now, the new node Si selects among the responders its relay node Xj from CHs or Sj from normal sensor nodes. This selection is made based on a cost function that considers the parameters like residual energy of the responder, distance from the new node to the responder, and distance from the responder to sink bearing in mind that CHs are favored than normal nodes in the selection. The cost function for CHs in Eq. 11 and normal sensor nodes in Eq. 12 is derived similarly as in Eq. 7.

$$\begin{aligned} COST{(Si,Xj)}= \frac{ResidualEnergy_{Xj}}{Dist_{SiXj} * Dist_{XjSink}} \end{aligned}$$
(11)
$$\begin{aligned} COST{(Si,Sj)}= \frac{ResidualEnergy_{Sj}}{Dist_{SiSj} * Dist_{SjSink}} \end{aligned}$$
(12)

Now, Si selects the relay node Xj or Sj having the maximum cost value as relay node. Therefore,

$$\begin{aligned} COST(Si, Xj)= MAX \left\{ COST(Si, Xk) \right\} \end{aligned}$$
(13)
$$\begin{aligned} COST(Si, Sj)= MAX \left\{ COST(Si, Sk) \right\} \end{aligned}$$
(14)

where Xk and Sk is in the communication range of Si and distance from Si to sink is greater than distance from Xk or Sk to sink.

Hence, the node that presents the high cost value is selected as relay node using Eqs. 13 and 14 and it is informed about the membership decision of the new node indicating its energy. If the responder is a normal sensor node it informs its current CH about the ID and energy of the new node; otherwise these informations are sent directly to the CH. Thus, the current CH can update its node list and send this to the cluster members to update their node list. Now, the same process applied on the cluster members is done also for the new node.

Lemma 5

The message complexity to join the network is O(\(nT+m\)), where T is the number of times a message is sent, n is the number of added nodes and m is the total of added node, relay node and CH.

Proof

To join the network, in the worst case a sensor node has to send a control message for T times in order to find a potential relay node among CHs or normal sensor nodes. Hence, the message complexity is O(nT) for n added nodes. On the receiving of this message, a node may respond by 1 control message. In the worst case, once the relay node is selected, it is informed about membership decision and in turn this node informs its current CH. All of this gives a total of 3 messages. The message complexity is O(3m), that is, O(m). Therefore, the worst case overall message complexity is O(nT \(+\) m). \(\square \)

Lemma 6

The time complexity to join the network is O(m), where m is the number of responder nodes.

Proof

To join the network, in the worst case a node requires to process m candidate responder nodes to select its CH or relay node, and hence, it executes in O(m) time. Thus, the overall time complexity to join the network is O(m). \(\square \)

4.4.2 Fault tolerance

Usually sensor nodes may be failed due to energy depletion, material damage, or environmental interference, etc. The failure can affect the overall network lifespan and degrades the overall performance of the network. Especially, failure of the CHs and relay nodes is more catastrophic in comparison with that of normal sensor nodes as it can prevents data to reach the destination. Therefore, in order to keep the normal functioning of WSN, the clustering and routing algorithms should cope up with the fault-tolerance aspects, especially the failure of the CHs and relay nodes. To make our protocol reliable and robust, we propose a mechanism which allows to tolerate the failure of CHs and relay nodes. Whenever a cluster member does not receive an ACK message from its respective CH, it realizes that this CH is faulty. In this case, the cluster member Si verifies its Backup_CH list. The CH Xj is chosen from this list based on Eqs. 11 and 13. In other words, the CH with high cost value is selected. If Backup_CH is empty, the cluster member verifies its FCS to seek for a relay node with high residual energy which is attached to a CH. Anyway, a cluster member from the faulty cluster joins a CH node directly or by using a relay node. The joined CH is informed thereupon directly or by using the relay node about the ID and energy of the member of faulty cluster. The CH based on the received information updates its node list and sends this to the cluster members to update their node list. The failure of the relay node is detected when a node during data routing does not receive an ACK message from its relay node. In this case, the same process as in Sect. 4.3.2 is done to find a relay node. When the relay node is founded and it is an ordinary node, it is informed about its new role. Afterwards, the data are forwarded via the new relay node.

Lemma 7

The best case message complexity is O(r), where r is the number of receivers and the worst case message complexity of fault tolerance algorithm is\(O(n+m)\), where n is the number of relay nodes and m is a total of nodes (sensor node, relay node and CH).

Proof

A receiver node in the data transmission path has to send an ACK message to ensure the reception of data. Thus, for detecting both failures of CHs and relay nodes, for every receiver it has to send 2 messages. Therefore, the message complexity is O(2r), and the best case total message complexity is O(r).

Whenever an ACK message is not received from a CH, the cluster member informs its found relay node through a control message and in turn this node informs its CH. The latter updates its node list and sends it to connected members and in turn these nodes send the list to unconnected members. This gives a total of 4 messages. Therefore, the message complexity is O(4m), that is, O(m). If an ACK message is not received during data routing, the worst case message complexity is the same as in Lemma 3, i.e., O(m). So, the worst case message complexity is \(O(n+m)\). \(\square \)

Lemma 8

The time complexity of fault tolerance algorithm is O(n), where n is the number of nodes (CHs and nodes in FCS).

Proof

To recover from a failure state, a sensor node is required to process a total of n nodes (CHs and normal sensor nodes from Backup_CH and FCS respectively) in the worst case. Thus, the time complexity is O(n). On the other hand, to recover from a failure of a relay node, the worst case time complexity is the same as in Lemma 4, i.e., O(n). Therefore, the total time complexity is O(2n), i.e., O(n).

The pseudo-code of the ECRP algorithm is shown in Algorithm 1. \(\square \)

figure d

5 Performance evaluation

Here, we evaluate the performance of our proposed ECRP algorithm and compare it with those of 2 recent and relevant existing protocols: BPA-CRP and EA-CRP. Now, we first describe the energy model followed by simulation parameters and performance metrics, then we present the simulations scenarios and results.

5.1 Energy model

We use the same energy model as discussed in [22, 23]. The resource manager of Castalia [24] is a module responsible to compute the energy consumed in different states such as transmission, reception, etc. The typical energy of AA batteries is 18,720 J. Energy is linearly subtracted based on total power drawn and time passed. The modules that model the hardware devices namely the radio and the sensor manager send messages to the resource manager so as to signal how much power they currently draw. Then the resource manager has an overview of the overall power drawn and based on this it computes the energy consumed each time there is a change in power or periodically i.e. if the power does not change for some time. CC2420 and CC1000 define the real radios of the same name by Texas Instruments. For our simulations, we choose the widely used CC2420 radio.

5.2 Simulation parameters and performance metrics

The simulations are carried out on Intel Core i5 Laptop using Castalia simulator version 3.2. The radio propagation model used in simulations is lognormal shadowing model. The formula in reference [22] that returns path loss in dB as a function of the distance between two nodes is defined in Eq. 15.

$$\begin{aligned} PL(d) = PL(d_{0})+10 \cdot \eta \cdot \log \left( \frac{d}{d_{0}}\right) + X_{\sigma } \end{aligned}$$
(15)

where PL(d) is the path loss at distance d, \(PL(d_{0})\) is the known path loss at a reference distance \(d_{0}\), \(\eta\) is the path loss exponent, and \(X_{\sigma }\) is a gaussian zero-mean random variable with standard deviation \(\sigma\).

The overall simulation parameters fixed for our simulations are summarized in Table 2.

Table 2 Simulations parameters

We have conducted many simulations in order to evaluate the performance of our protocol. For this purpose, the main metrics used are stated as follows:

  1. 1.

    Network lifetime: Two definitions have been chosen to examine the robustness and efficiency of our proposed protocol which include the follows:

    • First Node to Die (FND): The time calculated from the initial deployment to the time when the first node is dead, measured in rounds.

    • Last Node to Die (LND): The time calculated from the initial deployment to the time when the last node is dead, measured in rounds.

  2. 2.

    Energy efficiency: It refers to the average of sensor nodes’ residual energy per round.

  3. 3.

    Packet delivery ratio (PDR): Is the ratio of data packets successfully received by the sink to the total number of data packets sent.

5.3 Simulation scenarios and results

For the sake to compare the performance of ECRP, BPA-CRP and EA-CRP protocols and show the significance of our contributions, we have considered 4 simulation scenarios. We vary the number of nodes and simulation area from scenario to another. Indeed, we deploy 100 nodes in (60 m * 120 m), 179 nodes in (80 m * 160 m), 280 nodes in (100 m * 200 m) and 716 in (160 m * 320 m) respectively in scenarios namely WSN#1, WSN#2, WSN#3, and WSN#4. The network density is kept equal to 0.014 node/\({{\rm m}}^{2}\). For each scenario, we measure the network lifetime, energy efficiency and packet delivery ratio. Below, we analyse the simulations results.

We present, for each scenario, the network lifetime in Fig. 2 and Table 3 and the average sensor nodes’ remaining energy according to the number of rounds in Fig. 3. It can be seen that the network lifetime and average sensor nodes’ remaining energy decreases as the number of rounds increases. This is obvious since the sensor nodes ensure different tasks that deplete their energy. Moreover, it can be observed that ECRP outperforms the other protocols in terms of network lifetime and average sensor nodes’ residual energy. More the protocol conserves as much as possible the nodes’ energy, more nodes remain alive. This is the case of ECRP. It can be also observed from Figs. 2 and 3 and Table 3 that the number of alive nodes and the average residual energy of sensor nodes decrease as the couple sensor field size/number of sensor nodes increases and this is expected due to the increase in the distance required to transmit the sensed data to the sink in addition to more communications that will take place. The long distance transmission and the more communications in the network result in high-energy consumption which in turn decreases network lifetime and prevents from the economical use of sensor nodes’ energy. As a consequence, the largest sensor field size (i.e., 160 m * 320 m) with 716 sensor nodes in scenario WSN#4 causes a noticed degradation of network performance in terms of number of alive nodes and average sensor nodes’ remaining energy, while among all the scenarios, the scenario WSN#1 configured with 100 sensor nodes in 60 m * 120 m contributes to having the best network performance.

Fig. 2
figure 2

The number of alive nodes

Table 3 The FND and LND in ECRP, BPA-CRP and EA-CRP (In rounds)
Fig. 3
figure 3

The sensor nodes’ average residual energy

The justifications for the improved performance of the ECRP over the existing protocols BPA-CRP and EA-CRP are summarized as follows:

  1. 1.

    The desired behavior behind the better performance of ECRP over its counterparts consequences basically from the fact that the setup phase is done in ECRP only once until the network stops to function. In other words, the protocol avoids the massive control overhead due to re-clustering which undoubtedly reduces the energy consumption of sensor nodes and thus protracts the overall WSN lifetime. Despite of, BPA-CRP tries to reduce the overhead due to re-clustering still there are messages exchanged in the beginning of each batch which increases the energy consumption and reduces the network lifespan. EA-CRP presents the worst performance because of its frequent re-clustering which requires a massive overhead exchange and as a result the network performance is degraded in terms of energy consumption and lifetime.

  2. 2.

    In ECRP and EA-CRP, the higher energy nodes are favored during election of CHs, because the higher energy sensor nodes can resist the extra heavy load of the CHs. Further, the CH’ role is distributed dynamically upon the WSN nodes. These advantages balance the energy consumption of sensor nodes. However, BPA-CRP selects CHs arbitrarily which may result in early death of some nodes and accordingly the energy consumption is uneven.

  3. 3.

    During cluster formation, in EA-CRP and BPA-CRP, the ordinary sensor nodes decide to join a CH based on a single parameter. Such decision results in non-uniform load distribution. Fortunately, in ECRP the non-CH nodes join a CH considering a cost function which is defined on the basis of multiple parameters like the remaining energy of CH, the distance between the sensor node and CH, and the distance between the CH and sink. The selection is then optimized which leads to enhance the network performance.

The PDR is represented for each scenario in Fig. 4. In this figure, we observe that ECRP has better performance against its peers in terms of PDR. This desired behavior results from the point that ECRP is more reliable in forwarding data packets to their destinations.

Fig. 4
figure 4

The packet delivery ratio

6 Conclusion

Many existing clustering protocols have been proposed for the sake of energy balancing and network lifetime maximization. However, these protocols suffer from repetitive clustering which may unnecessarily dissipate limited energy of the overall WSN. In this paper, we propose ECRP aiming at rotating based on energy parameter the CH role among all cluster members of a cluster. The setup phase is done only once until the network stops to function. Not only to this extent, but rather ECRP can adapt the network topology change. Additionally, we propose a multi-hop routing scheme for inter-cluster communications using a cost function conceived in such a way to reduce and balance the energy consumption of sensor nodes. As well, an algorithm to tolerate the failure of CH and relay node is proposed. We have presented various simulation results using different scenarios. It has been shown that the proposed protocol outperforms the existing protocols, namely BPA-CRP and EA-CRP in terms of network lifetime, energy efficiency, and packet delivery ratio. In future, we will give a closed-form solution for the fault tolerance of the ECRP. Besides, our attempt will be made to handle node mobility as well as implement and test the performance of ECRP in an energy-constrained real scenario like forest fire detection.