1 Introduction

Wireless sensor network (WSN) has been widely used in many fields including health care, transportation management, military surveillance, etc. The battery-operated sensors are placed in open fields without human attendance and they acquire data contiguously over time. Through the wireless network, this information is transmitted and aggregated at a server located far away from the deployed sensors. Sensors are usually energy constrained and recharging battery is often costly or economically unfavorable. Hence, the management of sensors is an important factor for extending the network availability. Besides minimizing the energy consumption of data acquisition by advancing the sensing technology, it is preferable to balance the energy level of sensor nodes so that the entire network achieves an optimum availability.

In order to enhance the network availability and scalability, several network topologies and management methods have been devised, among which grouping sensors into clusters has demonstrated many advantages such as enabling smaller routing table, conserving communication bandwidth, avoiding redundant exchange of messages, and reducing maintenance overhead [13]. In a clustering WSN, each cluster usually consists of at least one surrogate node, often referred to as the CH. The CH may be dynamically decided or preassigned by the network designer. The communication between the cluster and the base station is facilitated by this CH. The problem of choosing the CH in multi-hop clustering model is more complected than the single-hop model. This is because in a multi-hop model, the CH may need to choose another node as a parent node to connect it with the base station, whereas in a single-hop model we only need to determine the CH of each cluster that will communicate directly with the base station. Figure 1 shows the difference between the two models. To balance the energy level of all nodes, selecting appropriate CHs becomes a key step. There exist many methods in the literature that address this issue [413]. In addition, a set of studies about energy efficiency of WSN have been provided [4, 5, 7, 8, 14], and several protocols and algorithms have been proposed in order to get an optimal power conservation in cluster-based WSN [913, 15]. However, most of them are not appropriate to work with WSN in dynamic environments, where sensors are unattended and it is very difficult to recharge their battery. In addition, they are not working well in the large sensor field in which the multi-hop clustering model is preferable. Low energy adaptive clustering hierarchy (LEACH) is one of the most promising cluster-based routing protocols [16]. LEACH is a distributed single-hop clustering algorithm. There, it is assumed that all nodes can perform long-distance transmission to the basestation (BS). Thus, selecting the CH depends on the energy used by all clusters. HEED is another cluster-based algorithm [17] based on the node’s remaining energy, while a secondary parameter is the number of node neighbors. In this paper, we propose a new CH selection method based on GA, called DCH-GA, for both single-hop and the multi-hop cluster models. DCH-GA is designed to match the requirements of the dynamic environments by electing the CH based on six main features. These features are the remaining energy, the consumed energy, the number of nearby neighbors, EAD, the node vulnerability, and the degree of mobility. DCH-GA assigns each factor a degree of priority as explained in the Methodology section. To extend the network longevity, the count of clusters is determined automatically. Also, to avoid energy consumption at sensor nodes, DCH-GA runs the GA operations at the BS after each transmission round. This paper assumes that N sensor nodes are randomly distributed in a two-dimensional square field and that the sensor network has the following properties:

Fig. 1
figure 1

Clustering models in WSN

  • All nodes are able to change their location from time to time.

  • All nodes are able to send the data to the BS.

  • The initial energy of all the nodes is the same.

  • All nodes are energy-constrained.

  • The network lifetime depends on the round first node die.

  • The algorithm is working on multi-hop clustering model.

  • Links are not symmetric. So, the transmission power level may differ between all pairs of nodes.

  • All clusters are homogeneous, but they may have different sizes.

  • There is only one BS located outside the network sensing field.

The remaining of this paper is organized as follows: Sect. 2.1 describes the GA and our hypotheses to elect the CH using it. Section 3 describes the proposed DCH-GA algorithm, and the parameters of the CH selection process. Finally, some conclusions are drawn in Sect. 4.

2 Related Work

Intelligent algorithms provide adaptive mechanisms that exhibit intelligent behavior in complex and dynamic environments like WSNs [18]. Various authors [1822] discussed the routing protocols in cluster-base WSN based on intelligent algorithms such as RL, ACO, FL, GA, and NNs. On the other hand, many CH selection methods based on different techniques have been proposed. Local negotiated clustering algorithm (LNCA) presents a novel clustering algorithm, which employs the similarity of nodes readings as an important criterion in cluster formation. ACE clusters the sensor network in a constant number of iterations using the node degree as the main parameter. In GA-WCA, the load balanced factor is considered as one of the weights along with a sum of distances from all neighbor nodes to CHs. LA2D-GA takes only the distance as a parameter to calculate the fitness function; also the representation of a chromosome in a two-dimensional grid represents valid statistics of a WSN [23].

Low energy adaptive clustering hierarchy (LEACH) [24] was proposed to create clusters in dynamic environment. However, the number of clusters must be determined in advance. Moreover, it can not be used in a multi-hop clustering environment. M-GEAR [25] is a clustering method in which the sensor nodes are divided into four logical regions depending on their locations in the network field. It needs an external gateway to be located at the center of the field while the BS is located out of the sensing field. M-GEAR selects CHs probabilistically in each region. Stable election protocol forms the network clusters by selecting sensor nodes as CHs using a weighted probability [26]. Lindsey and Raghavendra extended the LEACH method and proposed the PEGASIS algorithm [27], in which each node communicates only with the closest neighbor and takes turns when transmitting to the base station. The threshold sensitive stable election protocol (TSEP) [28] is another method in which there are three different levels of energies for the nodes, and the CHs selection is threshold-based. However, the TSEP method is designed for heterogeneous networks. Elbhiri et al. [29] developed a distributed energy-efficient clustering procedure that balances the CH selection by taking residual energy into consideration.

CHSCDP [30] is designed as an effective algorithm to select the optimal CHs among all nodes fairly, and tried to reduce the energy consumption to extend the lifetime of network. However, this method did not provide additional details about its ability to work in long area field. In addition, it assumes that the working environment is static and all its components are pre-known. In [31], a GA-based CH selection method is proposed for centralized clustering WSN. It aimed to create load balanced network than the traditional clustering algorithm. However, the GA running time still represents a big challenge for this method. LEACH-MAC [32] is proposed to form the optimum structure of WSN by forming its clusters. Although, LEACH-MAC achieved better network performance in terms of the required time to form the network clusters, it determines the count of the clusters in advance. To best of our knowledge, the count of the clusters can only be determined depending on the current environmental status of all sensor nodes. Thus, LEACH-MAC can not be used for the applications that need dynamic environments.

In Ref. [33], the Degree of connectivity is the main factor in choosing a CH. The degree of connectivity of a node, i.e. the number of its most immediate neighbors, is also a criterion that seems interesting to study. Intuitively, the more neighbors a sensor has, the more it seems to become an appropriate candidate as a CH, since a sensor with a low degree of connectivity might have little information retreival from its neighborhood to aggregate and to forward to the BS. In the initial phase, each sensor is involved into the neighborhood information exchanges (hello protocol), which allows it to determine its degree of connectivity and the location of BS. In EEUC [34], the distance between the node and the BS represents the main parameter for selecting the CH. The EEUC resulted in a network that is partitioned into clusters of unequal sizes, where clusters closer to BS have smaller sizes than those farther away from the BS.

2.1 The Proposed CH Selection Factors

Given the coordinates of the sensor nodes, the distance between two of them can be approximated with EAD. The value of EAD is computed depending of the Euclidean distance and take into account the environmental hindrances between the node and all other nodes in its cluster, such as trees or buildings. To calculate the EAD for the node v we assume that the distance between the BS and the node v is ED \(_{v,BS}\). Also, we assume that the Euclidean distance is used to calculate the distance ED \(_{vc}\) between v and the center of its cluster. Each hindrance between v and any other nodes in the cluster is H and the total number of hindrances inside the cluster is nh. The number of nodes in the cluster is n. Each hindrance has a weight W which represents it’s ability to consume the node energy. Let W = \(\{w_1, w_2, w3, \ldots , w_{nh}\}\) where each H has an ID that is represented by the index i of \(w_i\). So, the EAD can be represented as Eq. 1.

$${\textit{EAD}}_v=\frac{{\textit{EAD}}_{vc} + \sum _{v=1}^n \sum _{j=1,v\ne j}^n \sum _{k=0}^{nh} (H_{vj})* w + {\textit{EAD}}_{v,BS}}{2}$$
(1)

The consumed energy E, represents the energy needed to transfer the aggregated message from the cluster to the sink. According to the first order radio model [22], the consumed energy required to send a message with k bits length along a d distance is computed through Eq. 2 for a cluster with k member nodes, the cluster transfer energy is defined as follows:

$$E=\sum _{j=1}^k E_{T_{jh}}+ k E_R + E_{T_{hs}}$$
(2)

The first part of Eq. 2 shows the energy consumed to transmit messages from k member nodes to the CH. The second part shows the energy consumed by the CH to receive k messages from the member nodes. Finally, the third part represents the energy needed to transmit from the CH to the BS. The energy required (\(E_{T_{ij}}\)) to transmit a message of length l bits from a node i to a node j and the energy consumed in receiving the l-bit message (\(E_R\)) are given as follows:

$$E_{T_{ij}}= l E_e +l_{\epsilon _l} d_{ij}^4$$
(3)
$$E_{T_{ij}}= l E_e +l_{\epsilon _s} d_{ij}^2$$
(4)
$$E_{R}=l E_e +l E_{BF},$$
(5)

where d is the distance between nodes. However, for a long range transmission such as from a CH to the BS, the energy consumed is proportional to \(d^4\). The \(E_{BF}\) represents the cost of beam forming approach to reduce the energy consumption. Also, the remaining battery power of a node v (donated by RP(v)) can be calculated as

$$E_{{RP}_v}=E_s -\sum _{t=0}^c (E_{Tv}(t)+ E_{Rv}(t)),$$
(6)

where Es is the initial energy of the node and c is the current time. The number of neighbors of the node v (donated by N(v)) is the count of nodes whose distance d from v less than or equal to \(\beta\). The \(\beta\) value differs from one application to another according to the coverage space of the WSN.

$$N(v)= |n_i| \Leftrightarrow {d_{{vn}_i}} \le \beta{:}(v \ne n_i \&\beta >0)$$
(7)

Computing the running average of the speed for every node till current time T gives a measure of its mobility [20]:

$$M_v=\frac{1}{T}\sum _{}^T \sqrt{(x_t - x_{t-1})^2+(y_t - y_{t-1})^2},$$
(8)

where \((X_t, Y_t)\) and \((X_{t-1}, Y_{t-1})\) are the coordinates of the node v at times t and \(t-1\), respectively. Finally, and based on [35], a node that has high value of vulnerability means that failure of such node may lead to the disconnection of the entire network. Thus, we should avoid this node to be CHs. Vulnerability index VI of the node v can be calculated as Eq. 9.

$$VI_v=\frac{N^v_{before}}{N^v_{after}}\times \frac{L^v_{before} +1}{L^v_{after} +1}$$
(9)

Here, \(N^v_{before}\) is the number of nodes before removing v node, \(N^v_{after}\) is the number of nodes after removing v node. Likewise \(L^v_{before}\) is the number of levels before removing v node, and \(L^v_{after}\) is the number of levels after removing v node.

3 Methodology

3.1 Data Representation

Each factor is represented by a set of bits depending on its priority as shown in Table 1. These priorities represent one of the hypothesis we need to prove during this contribution using GA. At Table 1, the symbols H, L, and M are used to indicate to the priority degree whether it is high, low, or medium priority respectively. These priorities will affect the fitness value of the node. \(F_1\), \(F_2\), and \(F_5\) are represented with only one bit because the domain value of each one has only two values. But \(F_3\), \(F_4\), and \(F_6\) are represented by three bits each to get more flexibility because the domain contains four values. Also, this representation (of \(F_3\), \(F_4\), and \(F_6\)) produce logical results during the mutation process. An example of a chromosome that represents all features in direct binary representation (DBR) is shown at Fig. 2. The fitness function f(x) evaluates the fitness value of each sensor node. Its value depends on the six features as shown in Eq. 10

$$f(x)=f (F_1, F_2, F_3, F_4, F_5, F_6)$$
(10)
Fig. 2
figure 2

Chromosome representation

Table 1 Sensor node features

So, the nodes with \(F_1=(0)\), \(F_2=(0)\), \(F_3=(000)\), \(F_4=(111)\), \(F_5=(1)\) and \(F_6=(000)\) have high fitness values and can be declared as CHs. And the node with the lowest fitness value can be ignored from the list. The fitness function of our problem can be set as in Eq. 11.

$$f(x)=\frac{1+F_4^2 + 2F_5}{1+F_6^2+F_3^2 + F_2 + F_1},$$
(11)

where the remaining energy \((F_4)\) has the highest priority, and the number of neighbors \((F_5)\) has a medium priority. In addition to the fitness function, we shall calculate the probabilities of selecting \(P(S_i)\), the expected count of select (\(\overrightarrow {{EC}}\)), and the actual count of select for each chromosome as shown in Eqs. 12 and 13:

$$p(s_i)= \frac{f(i)}{\sum _{x=1}^n f(x)}$$
(12)
$$\overrightarrow {{EC}}=\frac{f(i)}{\frac{1}{n}\sum _{x=1}^n f(x)}.$$
(13)

Recall that n is the number of the chromosomes in the population. For example, let us begin with a random initial population of size 4 (Table 2). A generation of the GA begins with reproduction. We select the mating pool of the next generation by spinning the weighted roulette wheel six times. So, the best node representation get more copies, the average stay even, and the worst die off and will be excluded in further processing. To do that, we calculate \(\overrightarrow {{EC}}\) and \(P(S_i)\) as shown in Eqs. 12 and 13. The actual count value is the result of Roulette Wheel work. The Mate and Crossover values are randomly selected. Table 3 contains the next generation that resulSt from the GA work acting on our initial population. It shows that CHs have improved in the new population. Nodes that do not exist in the new generation will be ignored from the competition process in the future. In case a new node is not present in the current cluster, then it will be excluded also (Table 4).

Table 2 Example of initial population
Table 3 Example of using GA for CH selection
Table 4 Example of using GA for CH selection

3.2 Phase I: CH Selection Using GA

GAs are adaptive search methods that simulate some natural processes: selection, information, inheritance, random mutation and population dynamics [36]. A GA starts with a population of strings and thereafter generates successive populations of strings. A simple GA consists of three operators [19]: Crossover, Reproduction and Mutation. The chromosome of the GA contains all the building blocks to the solution of the problem at hand in a form that is suitable for the genetic operators and the fitness function. In our problem, each individual sensor node is represented by 6 features. Based on the CH selection factors that have been described previously, the CH selection algorithm is listed at Algorithm 1. In addition, its corresponding flowchart is shown at Fig. 3.

Fig. 3
figure 3

Cluster model of WSN

figure g

The creation of a population generates n number of nodes and represent each one in 12 bits. Returning the CH takes three arguments and searches for the best CH according to the fitness value. The next generation algorithm performs the crossover, mutation, and reproduction processes and returns a new generation of nodes. N, PC , PR, PM, \(\theta\), and T denote the population size, crossover probability, reproduction Probability, mutation probability, threshold, and time interval, respectively.

Figure 3 shows the main parts of the flowchart. It starts with generating the initial population of chromosomes of size n and calculates the values of the six features for each chromosome. Then it calculates the fitness function for each chromosome and applies the roulette wheel to compute the actual count of selecting the chromosome. After that the processes of mutation, reproduction, and crossover will performed according to their ratio. These processes will be repeated until the termination condition will be achieved. After that the CH will be selected according to the fitness value.

3.2.1 Phase II: Multi-hop Model Construction

The multi-hop clustering model is a special case of the clustering model in which a CH can not transmit the data directly to the BS for whatever reason. In this case, the CH tries to find another CH closer to the BS to work as a gateway between them. Now, assume that we have a WSN that contains a set of clusters that are formed by DCH-GA. Each cluster has a specific CH for a fixed round of data transmission. After each round, the BS selects a CH for each cluster depending on the available information regarding all nodes such as the remaining energy, location, and neighbors. Also, let us assume that some of CHs cannot transmit data directly to the BS if the distance between them is more than a certain threshold \(\alpha\). Each of them must transmit its data to an intermediate CH between it and the BS. So, we propose a simple algorithm for constructing the multi-hop model after creating the clusters and determining the CH for each one. The goal of this algorithm is to extend the network lifetime by choosing the path with minimum energy consumption. We assume that this algorithm will run at the BS after each round of sending data to it from all CHs to reconstruct the network hierarchy. The BS will run it for the CH that is far away from it more than a threshold value \(\alpha\). It evaluates a CH n by combining g(n), the cost to reach the CH, and h(n), the cost to get from the CH to the BS. The algorithm selects the CH with minimum f(n). Let us suppose that the CH that we need to determine the best path from it to the BS is given by i, the remaining energy of CH n is \(E_n\), and the count of hindrances between i and n is \(|H_i , n|\). The cost function can be calculated as Eqs. (14), (15), and (16).

$$f(n)= g(n) + h(n)$$
(14)
$$g(n)= ED_{i,n} + E_n + |H_i , n|$$
(15)
$$h(n)=ED_{n,BS} + |SN_n|,$$
(16)

where \(|SN_n|\) refers to the counting of nodes in the cluster. This \(|SN_n|\) is included in the calculation to indicate the traffic loads of the CH. The ED represents the expected distance between two nodes and is calculated based on the EAD. The ED can be calculated as the blow example at Fig. 4. After several iterations, the algorithm selects the path \(\varrho [x, BS]\) from the CH x to the BS, which is the minimum energy path. The protocol assumes that each CH can reach any other within its environment as shown in Algorithm 2 where N is the count of the CHs. Figure 4 shows an example of a multi-hop intelligent algorithm. The example contains three clusters and one BS. For each cluster, one CH is defined along with its features. The features of the CH are ID, coordinates, Remaining Energy, and Number of Nodes in its cluster, respectively, as shown in the figure. Assume that the CH C cannot interact directly with the BS. It needs another CH to works as a mediator between them. Depending on its signal strength, it can select between A and B. Applying our proposed DCH-GA algorithm, C will calculate the cost function f(A) and f(B) for A and B, respectively as shown at Table 5. From Table 5, the node C will select the CH B to be its parent because it has the minimum cost value.

Fig. 4
figure 4

An example of the proposed multi-hop intelligent algorithm

figure h
Table 5 Cost Function for CHs. Where \(ED_{BS}\) is the EAD from the CH to the BS while \(ED_C\) is the EAD between it and the CH C

Given the coordinates of sensor nodes, ED between two of them can be approximated taking into account their remaining energy. The values of ED are computed by combining the Euclidean distance between the nodes and the environmental hindrances. The hindrances between the node and all other nodes in its cluster such as trees or buildings are taken into account. To calculate the ED between the node V and the node B, we assume that the Euclidean distance between B and V is \(D_{VB}\). Also,we shall assume that the Euclidean distance is used to calculate the distance \(D_{VC}\) between V and its CH. Each hindrance between V and any other nodes in the cluster is H and the total number of hindrance inside the cluster is nh. The number of nodes in the cluster is n. As mentioned before, each hindrance has a weight W. By modified EAD, the ED can be represented as Eq. (17)

$$ED_V= D_{VC} + \sum _{i=1}^n \sum _{j=1, i \ne j}^{n} \sum _{i=0}^{nh} [ H_{ij} w_i ] + D_{VB}$$
(17)

In case of retrieving the distance between a CH i and the BS, ED can be calculated as a simple Euclidean distance (\(ED_{i,BS} = D{i,BS}\)). Algorithm 3 shows the main steps of the proposed ED algorithm.

figure i

4 Experimental Results and Discussion

To evaluate DCH-GA, we repeated ten times the simulations and reported the average performance. Comparison studies were conducted with some other methods reported in the literature. Table 6 lists the network parameters used in our experiments and two scenarios are created to study the routine method performance in small and large networks.

Table 6 Network properties

Sensors are randomly placed within a 100 meters by 100 meters area and the base station is placed at a corner of the field, i.e., at (100, 100). Figure 5 illustrates an example of node distributions. Sensors are depicted with small rectangular icons and the base station is marked with a larger rectangular icon.

Fig. 5
figure 5

An example of sensor and base station placement

Figure 6 illustrates the percentage of the remaining energy of sensor nodes at transmission rounds of 100 and 1000. Nevertheless, it is clear that the remaining energy is fairly equal with some fluctuations. That is, as a consequence of DCH-GA, the variance among remaining energy is quite low, which implies that the sensor nodes shared the burden of relaying messages and, hence, elongated the overall network life. The simulation is repeated to check the network life time. The following GA parameters are used for simulation: population size = 20, generation size = 120, crossover rate = 0.60, and mutation rate = 0.001. The comparison is shown in Table 7. Figure 7 depicts the change of the percentage of live sensor nodes throughout the entire network life.

Fig. 6
figure 6

The remaining energy of all sensor nodes a at network transmission round 100 and b at network transmission round 1000

Table 7 Network life time
Fig. 7
figure 7

Network lifetime with a initial energy 0.25  J/node, b initial energy 0.50 J/node, and c initial energy 1.0  J/node

The forming of clusters was greatly influenced by the spatial location of sensor nodes. It is interesting to see that the low-initial-energy nodes that served as cluster heads are usually far away from the high-initial-energy ones, which justifies their role as CH.

Table 8 lists the average remaining energy of sensors and its standard deviation at various transmission rounds. Due to uneven distances to the CH of the member nodes, the energy expenditure for sensors varied, and is thus inevitable that STDs continued to increase. However, the small STDs values indicate a balanced energy consumption among sensors.

Table 8 Remaining energy (J) variance of sensor nodes

To evaluate the efficiency of DCH-GA, Table 9 lists the average time used to structure clusters in each transmission round. The time reported is before the first node became unavailable due to energy exhaustion. The number within parenthesis is the standard deviation. The first column represents the initial energy of the sensor nodes.

Table 9 Average time (in seconds) for network structuring

5 Conclusion

In a clustering WSN, each cluster usually consists of at least one surrogate node, often referred to as the CH. The CH may be dynamically chosen or preassigned by the network designer. Communication between the cluster and the base station is facilitated by this CH. The problem of choosing the CH in a multi-hop clustering model is more complex than in a single-hop model. The multi-hop clustering model is a special case of the clustering model in which a CH cannot transmit the data directly to the BS.

There are six main significant factors for selecting a CH node in a multi-hop cluster model in WSNs. These factors are: the distance from cluster center, the vulnerability index, the degree of mobility, the degree of mobility, the remaining battery power, the number of nearby neighbors, and the consumed energy. All these factors are related when choosing CH and ignoring one of them will affect the network lifetime. The degree of priority differs for each factor. In the present contribution, we have proposed a new CH selection method based on GA, called DCH-GA, for both single-hop and the multi-hop cluster models. The procedure introduced here is designed to match the requirements of the dynamic environments by electing the CH based on six main features.

During examination, we repeated simulations and reported the average performance. Comparison studies were conducted with some methods reported in the literature. In the future work, we shall focus on secure data transfer between each node and the CH node taking into consideration the dynamic environment inside the cluster.