Keywords

1 Background and Motivation

Wireless Sensor networks (WSNs) are used in many applications, such as health-care, environment, industry, environment monitoring surveillance in military battle field and so on. Scientists have the mission to find an efficient way in order to influence their future. They must overcome the major problems which WSNs are facing today, such as routing information in the network or the rapid depletion of energy [1].

According to literature, there are many routing schemes in WSN. Network structure can be flat or cluster-based [2]. Grouping sensor nodes into clusters provides an efficient and scalable network structure [3]. Various distributed cluster based routing protocols were proposed for WSNs. LEACH (Low Energy Adaptative Clustering Hierarchy) [4] is one of the wellknown of them. In this protocol, at each round CHs are selected in arbitrary manner. Farther node can be selected as CH and this may lead to both CH and its members dissipate their energy quickly which constitutes a real weakness. Other centralized cluster based algorithms have also been developed for WSNs. As an example, LEACH-Centralised (LEACH-C) [5], a version of LEACH where selection of CHs is done by BS. After selection phase of CHs, BS inform all nodes about clustering. Thus, in large scale networks, that leads to energy exhaustion. That’s why centralized approaches are applied only in small networks.

However, the classic solutions for cluster based routing protocols are not reliable for a large scale network because that requires a significant treatment to find the optimal solution. An efficient solution is using meta-heuristic approaches such as genetic algorithm (GA). GAs are based on the biological mechanisms such as the laws of Mendel and the fundamental principle of selection of Charles Darwin [6], the methods of evolutionary programming developed by Fogel [7] and the evolutionary strategies developed by Rechenberg [8] and Schwefel [9]. There are many GA based approaches for clustering in WSNs. A mechanism called “Economic Control Couverture Algorithm” (ECCA) [10], inspired by the multi-objective genetic algorithms (MOGAs) [11]), is used to maintain full coverage by optimizing the number of active nodes. Bhondekar et al. [12] proposed a low energy dissipation solution. The GA decide the selected active sensors, choose CHs and adjust the non-CH nodes transmission capacity. Contrariwise, the algorithm does not use intra-cluster multi-hop communications. Ferentinos and Tsiligkaridis [13] presented an “Adaptive design optimization of wireless sensor networks using genetic algorithms” (ADOGA). It has sophisticated features that make it capable to decide the activity/inactivity of the sensors. It manages the different roles of the sensors in function of their wireless signal. Bayrakli and Erdogan [14] proposed the algorithm GABEEC (Genetic Algorithm Based Energy Efficient Clusters). In this protocol, GA is applied to maximise time duration of network by finding the optimal number and location of CHs in each round. However, clusters number are fixed during the network lifetime. This leads to an unbalanced clusters. Hussain et al. [15] proposed an intelligent energy-efficient hierarchical clustering protocol named HCR-GA. It is applied only in small networks and it is not scalable. Song et al. [16] proposed a flat GA for energy-entropy based multipath routing in WSNs (GAEMW). In this solution, all nodes have equal functions and responsibilities. Every sensor is a transceiver/receptor in the same time, which may deplete its energy very quickly. Mohammed et al. [17] presented a new Genetic Algorithm-based Energy-Efficient adaptive clustering hierarchy Protocol (GAEEP) to preserve node’s energy by finding the best CHs using GA. Thus, GAEEP improves the stable period and maximizes the lifetime of WSNs.

However, none of the above algorithms consider the multipath intra-cluster data routing. Even, none of them except [16] combined the advantages of hierarchical distributed mechanisms with multipath transmissions. The present paper is distinguished with regards to others in the literature by the following issues.

  • 1. It deals with both GA based clustering and routing algorithms whereas literature considers only one of them based on GA.

  • 2. Proposed GA based clustering saves energy using residual energy of nodes and the density criterion.

  • 3. Proposed GA based routing performs data delivering using intra-cluster multipath transmissions.

The novelty of this work stands in the development of a new Low Energy hierarchical Clustering and Routing protocol based on Genetic Algorithm (LECR-GA) to efficiently maximize the lifetime and to improve the WSNs quality of service (QoS). The proposed clustering algorithm takes into account new parameters to optimize the mechanism of CHs selection. The improved algorithm introduced two characteristics including energy and the weight of a node (i.e. number of neighbours of the neighbours). By using the GA, the entire space of research was explored to arrive at a desired optimization (best number and location of the CH), therefore, an economy of energy. The routing mechanism finds out the optimum path from all members of the cluster to the CH by choosing the most beneficial route, going through sensors having more energy and less distance to achieve data. The LECR-GA is divided into rounds. Each round begins with a clustering (set-up) phase where the optimum number of CHs is found and non-CH nodes join the nearest CH. Followed by a routing (steady-state) phase, where the sensed data are transferred to the end user. Finding good values to genetic parameters is sometimes a delicate problem. However, the efficiency of these algorithms depends strongly on the choice of genetic operators used during the process of the population diversification and in the exploration of the solutions space. The performance of the LECR-GA is compared with protocols of literature using Network Simulator 2 (NS2) simulation.

Table 1 summarizes the properties of some classical and based GA protocols to save energy.

Table 1. Summary of properties of energy saving protocols.

2 LECR-GA: Low Energy-Efficient Clustering and Routing Based on Genetic Algorithm

A presentation of our GA based algorithm is now given. It takes place in “rounds” which have approximately the same predetermined time interval. As shown in Fig. 1, each round consists of a clustering phase and a routing phase. Figure 2 is the detailed steps of the members organization in layers.

Fig. 1.
figure 1

Flow Chart of the proposed LECR-GA protocol

Fig. 2.
figure 2

Flow chart of organization members in layers

In the proposed LECR-GA protocol, the following assumptions about network model are fixed:

  • The BS is not limited in terms of energy, memory and computing power.

  • The BS is located outside the WSN.

  • All sensor nodes and the BS are stationary during all the simulation.

  • All sensors have a device to determine location.

  • The WSN includes homogeneous sensor nodes.

  • Initially, all nodes sensors have the same amount of energy.

2.1 Proposed Clustering Algorithm Based on GA

Our main focus is to maximize the network lifetime. This objective depends on the selection parameters of CHs and the cluster formation. Best number and positions of CHs are attended using the appropriate genetic parameters and energy of sensor nodes is minimised by choosing the nearest CHas performed by Algorithm 1.

Chromosome Representation: The chromosome is represented as a set of genes. Each gene is the concatenation of the values of the residual energy and the distance between the node and its neighbours. This explains our use of the real encoding genes. The number of live nodes indicates the size of the chromosome.

Initial Population: A set of chromosomes (all the living nodes of the network) is generated to create the initial population.

figure a

- The calculation of the average energy E(avg): It is done using formula 1.

$$\begin{aligned} E(avg)=\frac{ E_{S_1} + E_{S_2} + E_{S_3} +...+ E_{S_N} }{N} = \frac{\sum _{i=1}^{i=N} E_{S_i} }{N} \end{aligned}$$
(1)

where \(E_{S_i}\) is the residual energy of the node and N is the number of living nodes.

- Performances evaluation of initial population: After selecting the initial population, we must evaluate it according to energy criterion. If the residual energy of the node is greater than the average energy of the Network, the node can be a CH.

Selection: Once the performance evaluation of initial population step completed, the primary population on which another selection is made according to new criteria, is obtained. We look for all the nodes in the neighbourhood of the node \(S_i\). Then we calculate the distance separating these nodes and \(S_i\). At this level, we will introduce a new parameter, which we will call the weight of the node \(S_i\) described by formula 2:

$$\begin{aligned} Weight(S_i)={\frac{E_{S_i}}{ {\sum _{d=n}^{d=m} D(S_i,S_n) +...+ D(S_i,S_m)} }} \end{aligned}$$
(2)

where D is the distance between the node \(S_i\) and its neighbors and \(E_{S_i}\) is the residual energy of the node \(S_i\). The weight list is sorted in order to facilitate its manipulation.

Performance Evaluation of Selected Individual’s: After the list is sorted, the node in the 70% of the first cases in the weight list will participate in the new population for replacement.

- The computation of \(SNL_{S_i}\) (Neighboring-Life of node Sensor \(S_i\)) is made by introducing a new formula (3). It creates a relationship between the number of neighbors and the number of living nodes. More the \(SNL_{S_i}\) result is large, more likely, the node \(S_i\) will be a CH.

$$\begin{aligned} SNL_{(S_i)}=\frac{Neigh_{S_i}}{N} \end{aligned}$$
(3)

where \(Neigh_{S_i}\) is the number of neighbors of the node S and N is the number of the living nodes. The objective Fitness is described by formula (4). To be the best node elected as CH, the fitness function must be maximized. Obviously, the nodes having the maximum weight value, means that the transmission of data is more beneficial using these nodes. They need less energy and have more neighbours.

$$\begin{aligned} Fitness = max(weight) + max(SNL_{(S_i)}) \end{aligned}$$
(4)

Mutation and Crossover: During the application of our algorithm, we will not need either crossover or mutation since we will reach the optimum the first time. In this case, the probability \(p_c\) and \(p_m\) are very small and can be neglected.

Best Individual: Finally, after computing the node’s \(SNL_{S_i}\), the result is compared to that obtained by the function T(n).

$$\begin{aligned} {T(n)=} \left\{ \begin{array}{ll} \frac{p}{1-p*[r mod (\frac{1}{p}) ] } &{} \mathrm{n}\!\in \!\mathrm{G} \\ 0 &{} \mathrm{n}\!\notin \!\mathrm{G} \end{array} \right. \end{aligned}$$
(5)

where \(\mathbf p \) is the probability of CH in each round, which is the ratio of the total number of CH and nodes, \(\mathbf r \) is the current round number, \(\mathbf G \) is the set of nodes that have not been selected recently in \(\frac{\mathbf{1}}{\mathbf{p}}\) of the rounds. If \(SNL_{S_i}\) is strictly greater than T(n), the node is CH in this round. All elected nodes must inform the other nodes of the network of their decisions, using a maximum transmission radius to be heard by all the nodes. After receiving all warning packets, non-CH nodes join the best CH (nearest CH).

2.2 Proposed Routing Algorithm Based on GA

After cluster formation, the cluster space was divided into four layers over the cluster area (Circle centered by the CH with a radius equal to the distance of the furthest node from the CH). This will enable us to choose the next hop according to the distance criterion. Each layer was then divided into regions (North-east, North-west, South-East...) for choosing the next hop according to the energy criterion. Once this grouping is completed, we can apply our GA by Algorithm 2 to obtain the best paths as shown in Fig. 3.

figure b
Fig. 3.
figure 3

Path construction with GA

Layer Cluster Division: One of the most striking features of large scaling network is managing energy consumption. It should be noticed that the communication in clusters with a large number of nodes, quickly exhaust the CH battery. To solve these problems, a new concept called “Tree Trunk Approach” was introduced. After determining the optimum number of CH and their locations, as well as the clusters formation, the “Tree Trunk Approach” will be applied. A layered network is formed and the members of the cluster are placed in these layers as illustrated in Fig. 4. In this case, each node of the cluster knows exactly the nodes that are placed in the upper and lower layers and communicates with them. It can be noticed that the center of our tree trunk is represented by the CH. The “Nearest Node” is the closer node to the CH and “min” is its distance, and the “Farthest Node” is the further node from the CH and “max” is its distance. Based on their positions, the cluster was cut into four layers. The option of this number of layers was made to have several hops, necessary for the application of our GA. To do this, the distance between each member node and the CH was computed. Next, each node was assigned to its layer by referring to Table 2 where A is calculated using the following formula 6.

$$\begin{aligned} A = (max - min)/4 \end{aligned}$$
(6)
Fig. 4.
figure 4

Layer cluster division

Table 2. Interval layers

Cluster Division in Regions: This new division was used to make a more appropriate choice of the next hop using the energy criterion. This second division is made in order to avoid falling in the case where the chosen node (having the maximum energy) is distant.

Calculation of the Next Hop According to the Energy Criterion: The idea is to look for the next hop in the adjacent upper layer and in the same area. If no node verifying these conditions was found, we search subsequently in the upper layers that follow. If still no node was found, the CH will be the next hop. The combination of the chosen nodes formulate EnergyPath.

Calculation of the Next Hop According to the Distance Criterion: With the distance criterion, the nearest node in the upper layer is chosen as the next hop. When the layer L0 is concerned, the next hop is always the CH. The combination of the chosen nodes formulate DistancePath.

Application of the GA: After finding the next hop of each node (according to both criteria), the GA can be applied to find the optimal (beneficial) route. The proposed algorithm for routing from cluster members to CH is now presented.

Gene Representation: A gene is represented by a string determining the member nodes. Each gene represents the identity of the node (using the real coding). A group of nodes including the CH represent a set of genes (chromosome). The number of hops indicates the size of the chromosome.

Initial Population: At this step, each node has two ways (chromosomes) to route the data to the CH. One is according to the energy criterion, the other is according to the distance. These two paths will be our initial population (two parents). Moreover, the probability of crossover is equal to \(p_c = 0.99\) and the probability of mutation is equal to \(p_m = 0.01\) or \(p_m = 0\) (no mutation).

- Calculation of Distance, Hop and Energy:

Distance is the distance traveled along this path, Hop is the number of hops and Energy is the total energy of the nodes forming the route.

- Performance evaluation:

To evaluate the performance of the two routes that we obtained in the initialization phase, the fitness function is introduced in formula 7:

$$\begin{aligned} fitness = (0.04 * Distance) + (0.02 * Hop) + (0.94 * Energy) \end{aligned}$$
(7)

where 0.04, 0.02, and 0.94 are fixed constants for the entire simulation. Several coefficients have been tested and these one chosen gave us a better simulation results than others. We will try to minimize this function, since Min(fitness) shows that the packet travelling along this path, travels less distance, passes through a reduced number of hops and has a very high energy. The energy coefficient is large since the energy criterion is favoured.

- Crossover:

If the number of hops is equal for both routes, a 1-point crossover is applied as shown in Fig. 5. Crossover at 1-point will create two children from both parents so that each child has a part of each parent chromosome. For this, we choose an integer i as i \(\in [1, n[\), the first child will copy the genes 1, ..., i of the parent 1 and the genes i + 1, ..., n of the parent 2, and reciprocally for the second child. When there are a number of different hops, the optimal route is necessarily one of the two initial routes.

- Mutation:

The mutation is applied only if the fitness of the crossover is greater than the fitness of the two initial routes. An illustrative example is shown in Fig. 6. Mutation here consists of performing a random alteration of one gene of the chromosome.

Fig. 5.
figure 5

Crossover at 1 point \((i=2)\)

Fig. 6.
figure 6

Mutation at one 1 point \((i=3)\)

Reorganization of the Members in the Layers: After obtaining the optimal routes, each node will know its next hop. It can happen that a node, located in layer L3, will have its next hop in layer L1. In order to avoid synchronization problems, the layers were reorganized.

Creating the TDMA and Predecessor Tables: When each node has found its place in the layers, the TDMA table can be constructed by concatenating the lists L3, L2, L1 and L0. The distant members will transmit first. Having already assigned to each node its next hop, the table of predecessors can be structured.

Data Transmission: The CH remains in a sleep mode until the turn of the nodes in L0 for transmission arrives. When the nodes have received the tables (TDMA, predecessor and next hop), each one will use the Duty-Cycle technique to program the time of its sleep and its awakening. The Duty-Cycle is a technique used by the node sensor to save energy by switching periodically between the sleep mode “sleep” and the active mode “awake”. The main idea is to reduce the unnecessary activity time of the node, the sensor wake up only in the time of the transmission or reception of data. The functions of data transmission to the upper layer and to the CH aggregate the data after each reception. When the CH received the data from all the members of L0 with the data of their predecessors, it will transmit them to the BS with a single hop.

3 Simulation Results and Analysis

In this section, simulations using NS2 were performed to analyse and evaluate the performance of the proposed Protocol. To eliminate the experimental error caused by randomness, each experiment was run for 10 different times and the average was taken as the final result. The parameters used are described in Table 3.

Table 3. Simulation parameters

In Fig. 7 the performance of the GAEEP and LECR-GA protocols was evaluated in terms of the number of dead nodes. From this Fig. 7, it is noticed that the of first node death (FND) after 1000 s and all nodes died after 1175 s in GAEEP. However in LECR-GA protocol, the FND after 490 s and all nodes died after 1980 s. The LECR-GA prolongs the network lifetime by 35% compared to the GAEEP protocol, because the residual energy of sensors nodes in the proposed protocol decreases more slowly. This is related to the fact that the LECR-GA protocol always selects the nodes having higher energy than the average energy of the network to be CHs.

Fig. 7.
figure 7

Alive nodes number

Fig. 8.
figure 8

Packets number function of alive nodes number

Different network configurations are used in this simulation, for example the BS is located out of the network area (200 meters away) and their are 200 nodes (scalability) dispersed randomly in 100\(\,\times \,\)100 \(\mathrm{m}^2\) area. Figure 8 shows the amount of data received by the BS according to the number of alive nodes for the GABEEC, HCR-GA and LECR-GA protocols. The main purpose to compare HCR-GA and GABEEC was to know if our protocol LECR-GA keeps the same efficient performance even in large scale networks. We note that with 200 nodes, the number of data packets received was 70000 using LECR-GA, only 12000 using GABEEC and 2000 using HCR-GA. As the number of living nodes decreases, the BS continues to receive a phenomenal number of data packets until reaching 125000 data packets using LECR-GA, corresponding to the total extinction of the nodes. At the moment when the variation of the number of data packets received was only between 15000 and 20000 data packets for the GABEEC protocol, and between 8000 and 10000 data packets for the HCR-GA protocol, at the end of the simulation. It can be noticed that LECR-GA protocol have more data rate than HCR-GA and GABEEC protocols. Distributed systems are favored when it comes to large scale networks, where generally centralized systems fail to control a very large number of nodes.

4 Conclusion

Wireless sensor networks, that proliferate many applications which can be critical such as health-care and surveillance of military battle field, face the challenge of limited energy capacity. To achieve an efficient energy consumption, a distributed protocol based on genetic algorithm, namely LECR-GA, is proposed. Adopting the right choice of chromosome representation, fitness function and GA operations have led to longer system operability and maximum data rate with low complexity. The experimental results showed that the performance of the proposed algorithms is better than GABEEC and GAEEP, in terms of energy consumption and throughput.