1 Introduction

In WSNs, nodes have limited battery power, limited transmission range, as well as their processing and storage capabilities are also limited. The routing strategy selection is an important issue for the efficient delivery of packets to their destination. Moreover, such a strategy, regardless of the application, must try to maximize network lifetime and minimize energy consumption of the overall network. The sensor nodes are typically battery-powered and should operate without attendance for a relatively longer period of time. In most cases, it is very difficult and even impossible to change or recharge the batteries [13]. In a two tier WSN, the sensor nodes are divided into several groups, which are called clusters. Each cluster has a manager, which is known as CH. All the sensor nodes sense local data and send it to their corresponding CH. Then, the CHs aggregate the local data and finally send it to the BS directly or via other CHs [4, 5].

Routing in WSNs is very actuating due to the inherent features that distinguish these networks from other wireless networks. According to the relatively large number of sensor nodes, it is very important to implement a suitable routing mechanism. Routing mechanisms should consider the inherent features of WSNs (e.g. resource and topological constraints, etc.) along with application and architecture requirements. One of the most effective routing mechanisms to reduce the energy consumption and increase the network lifetime, is hierarchical mechanism. In hierarchical protocols, nodes are grouped into clusters and each cluster has a CH. For example, Fig. 1 shows a WSN with hierarchical routing [68].

Fig. 1
figure 1

Single-Hop data transmission from CHs to BS

The CHs are used for higher level communication and reducing the traffic overhead. The use of hierarchical routing has a lot of advantages. Some of these advantages are: (a) size reducing of the routing tables stored at each individual sensor node, (b) saving the communication bandwidth, (c) and the use of optimized management strategies to prolong the battery life of the sensor nodes [911].

In this paper, a centralized genetic-based clustering (CGC) protocol using onion approach is proposed. The CGC protocol performs the clustering operation and CH selection according to three criteria:

  1. (1)

    The residual energy of the nominated node should be higher than the average energy of the whole network.

  2. (2)

    A graph between the nodes which are in the radio communication range of each other, is formed. A weight is assigned to each edge of this graph. Then, the sum of weights for edges (SWE) connected to each node must be calculated. SWE must be maximized for a node so that it could be considered as a proper candidate for being CH. SWE is used as one of the criteria for selection optimal CH, because it makes an affordable transmission cost between members and the CHs.

  3. (3)

    The node should not be an outlier. We define a function as the function to indicate the node is outlier or not. This criterion is used because the nodes which have covered a large number of nodes, have more chance for being CH. Thus, the amount of surface area covered by CHs, increases, which makes the network more stable.

The genetic algorithm (GA) is used to search in a complicated search space and select optimum CHs. We have proposed an innovative fitness function which is tried to be minimized. Each chromosome which minimizes the fitness function is selected by BS and its nodes are introduced to the whole network as proper CHs. In fact, the innovative fitness function leads to appropriate chromosome selection. Furthermore, a novel concept which is called onion approach for upper level routing (between CHs) is proposed. This approach divides the networks into several layers which are called “Onion Layers” and leads to reduction in transmission costs between CHs. Simulation results show that the CGC protocol has significant improvements in terms of running time of the algorithm, the number of nodes alive, first node death (FND), last node death (LND), the number of packets received by the BS, and energy consumption of the network. Our main contributions can be summarized as follows:

  • In the proposed fitness function, we used a parameter by considering the weight of edges connected between the sensor nodes and their CHs. This is in contrast to the fitness functions which are used in the traditional genetic-based clustering protocols.

  • Using a new concept called onion approach which is not used in any previous work as a method of communication between CH nodes. This method reduces communication overhead among CHs. Also, energy consumption becomes more balanced.

  • The above two solutions include an proposed fitness function and onion layering approach make the CGC protocol more affordable in terms of network lifetime, energy consumption, and packets received by BS than the other traditional genetic-based clustering protocols.

The rest of this paper is organized as follows. Section 2 describes related work in hierarchical routing. Section 3 explains a brief description about genetic algorithm. Section 4 explains network and energy consumption model in proposed protocol. Section 5 presents the proposed protocol based on genetic algorithm (CGC). Section 6 introduces a new concept called onion approach for routing in upper level (among CHs). In Sect. 7 the proposed protocol has been analyzed in terms of time complexity. Also, the performance of it has been evaluated in terms of the following criteria: the number of dead nodes (or the number of alive nodes, because they can be used interchangeably), first node death (FND), last node death (LND), the number of packets received by the BS, and energy consumption of the network. Finally, Sect. 8 concludes this paper.

2 Related work

So far, various methods for clustering in WSNs and increasing the network lifetime have been proposed.

Low-energy adaptive clustering hierarchy (LEACH) [12] protocol is one of the most popular hierarchical routing protocols in WSNs. The operation of LEACH protocol consists of two phases. First, in setup phase the clusters are organized and CHs are selected. In this phase, a sensor node selects a random number between 0 and 1. If this number is less than the threshold T(n), the node becomes a CH. T(n) is calculated as follows:

$$\begin{aligned} T(n)= \left\{ \begin{array}{ll} \frac{p}{1-p \times (r \hbox {mod} \frac{1}{p})} &{} \quad if \; n \in G \\ o &{} \quad \text { otherwise.} \end{array} \right. \end{aligned}$$
(1)

In Eq. 1, r is the current round, p is the recommended percentage of CHs, and G is the collection of nodes which are not selected as a CH in the last 1 / p rounds. Second, in steady state phase the data is sent to the BS. The duration of steady state phase is longer than the duration of setup phase in order to minimize overhead. LEACH protocol increases the network lifetime compared with the previous protocols, and also supports high scalability. Despite these advantages, CHs are selected randomly, so the optimal number and distribution of CHs cannot be ensured. The nodes with low residual energy, have the same priority to be a CH as the node with high residual energy. For this reason, those nodes with less remaining energy, may be chosen as the CHs which will result that these nodes may die first. Although LEACH protocol presents randomly, adaptive and self-organize clustering, but it does not guarantee the number and placement of CHs. Therefore, LEACH-Centralized (LEACH-C) [13] protocol was proposed. LEACH-C utilizes the BS for cluster formation, unlike LEACH where nodes self-organize themselves into clusters. Initially in LEACH-C, the BS receives information about the location and energy level of each node in the network. With this information, the BS finds a predefined number of CHs and configures the network into clusters.

Centralized balance clustering (CBC) routing protocol based on location was proposed in [14]. In CBC, in order to keep clustering balanced through the whole lifetime of the network and adapt to the non-uniform distribution of sensor nodes, a systemic algorithm for clustering is designed. First, the algorithm determines the cluster number according to condition of the network, and adjusts the hexagonal clustering results to balance the number of nodes of each cluster. Second, it selects CHs in each cluster based on the energy and distribution of nodes, and optimizes the clustering results to minimize energy consumption. Finally, it allocates suitable time slots for transmission to avoid collision. In [15], LEACH protocol was improved with sleep mode. Authors proposed four new hierarchical clustering topology architectures: random CH and sub-CH (RCHSCH), random CH and max energy sub-CH (RCHMESCH), random CH and sub-CH with sleep mode (RCHSCHSM) and random CH and max energy sub-CH with sleep mode (RCHMESCHSM). The proposed architectures involve three-layers and are based on LEACH architecture. From the simulation results, RCHSCH, RCHMESCH, RCHSCHSM and RCHMESCHSM architectures perform better than the LEACH architecture. In RCHSCH, the sub-cluster formation of hierarchical clustering topology architectures is used in order to improve outcomes related to the problem where CHs die quickly in LEACH. Moreover, RCHSCH is improved by forming RCHMESCH wherein SCHs and RSCHs are elected based on the energy of the sensor nodes and wherein energy consumption can be balanced. Finally, RCHSCH and RCHMESCH architectures are improved such that a sleep mode is added to form the RCHSCHSM and RCHMESCHSM based on correlation of sensor data within sub-clusters.

Fig. 2
figure 2

The steps of GA

In [16], authors improved the clustering routing protocol using GA. In their study, GA is used to create energy efficient clusters for data transmission in WSNs. The BS uses GA to create energy efficient clusters for a given number of transmissions. The node is represented as a bit of a chromosome. A population consists of several chromosomes and the best chromosome is used to generate the next population. Based on the survival fitness, the population transforms into the future generation. Initially, each fitness parameter is assigned an conventional weight. After every generation, the fittest chromosome is evaluated and the weights for each fitness parameter are updated accordingly. The proposed technique uses a GA to determine the initial set of hierarchical clusters. Authors compared their proposed method with LEACH protocol in different ways. Their method saves more energy than LEACH, especially with an increase in the number of nodes in the network. However, since the proposed method in [16] does not consider the density of nodes in different parts of the network while selecting CHs, may select the nodes as CHs that they do not cover an adequate number of nodes. In [17], an optimal method of clustering homogeneous WSNs using a multi-objective two-nested GA is presented that its name is M2NGA. The top level algorithm is a multi-objective GA whose goal is to obtain clustering schemes in which the network lifetime is optimized for different delay values. The low level GA is used in each cluster in order to get the most efficient topology for data transmission from sensor nodes to the CH. The advantage of M2NGA compared with other heuristic clustering methods is its generality. Despite the advantage of the proposed method in [17], the relatively large computational overhead for its implementation, is considered as one of the weaknesses. In [18], authors proposed a new GA based clustering algorithm to solve the load balancing problem in WSNs. The algorithm forms clusters in such way that the maximum load of each gateway is minimized. In the phase of initial population generation, they restricted the generation of initial population by considering the connectivity between the sensor nodes and their CHs. Also, In the mutation phase, the mutation point is selected in such a way that it generates children chromosomes that ensures better load balancing. In [19], authors investigated the problem of grouping the sensor nodes into clusters to enhance the overall scalability of the network. A selected set of nodes, known as gateway nodes, will act as CHs for each cluster and the objective is to balance the load among these gateways. Load-balanced clustering increases system stability and improves the communication between the various nodes in the network. Their proposed algorithm adopts a centralized approach which assumes that each node is aware of the network topology. They first showed that a special case of load-balanced clustering problem (whereby the traffic load contributed by all sensor nodes are the same) is optimally solvable in polynomial time. They next proved that the general case of load-balanced clustering problem is NP-hard.

In [20], authors investigated the advantages and disadvantages of LEACH protocol and then put forward a clustering routing protocol for balancing the energy consumption based on simulated annealing and GA. They formed the clusters by simulated annealing and GA and then calculated the cluster center of each cluster. If the energy of the node in the cluster is higher than the average energy of the cluster, it will become the candidate CH; at last the candidate CH becomes the CH according to the distance from the cluster center of the cluster. The main operational difference between the proposed protocol in [20] and LEACH is the selection process of CHs; CH selection is performed by simulated annealing and GA. Also, it is based on a centralized control algorithm that is implemented at the BS.

3 An overview of genetic algorithm

Genetic algorithm (GA) is a kind of meta-heuristic and evolutionary search mechanisms based on natural selection and genetics. Evolutionary means that initial population converges to optimum solution during a specific procedure. Heuristics methods are problem-dependent techniques. As such, they usually are adapted to the problem at hand and they try to take full advantage of the particularities of this problem. However, because they are often too greedy, they usually get trapped in a local optimum and thus fail, in general, to obtain the global optimum solution [21, 22].

On the other hand, meta-heuristic algorithms (like GA) usually have shorter running time than heuristic algorithms and they are very simple. This probabilistic nature of the solution is also the reason they are not contained by local optima. The main reason that we used GA as an optimization method for clustering issue in WSNs is that meta-heuristic algorithms are problem-independent techniques. As such, they do not take advantage of any specificity of the problem. In general, they are not greedy. In fact, they may even accept a temporary deterioration of the solution, which allows them to explore more thoroughly the solution space and thus to get a hopefully better solution (that sometimes will coincide with the global optimum). In our proposed method, according to the simulation results, we realized that 1000 iterations is acceptable to find a very good solution because there was lack of change in optimum solution after a specific number of iterations (1000 iterations) [23, 24].

In GA, at first, an initial population of solutions including n chromosomes must be generated. This population might be generated either randomly or using another solution which is close to optimized model. Afterwards, all chromosomes belonging to this population must be evaluated using a fitness function. Figure 2 briefly depicts the steps of GA. Some of chromosomes belonging to current population (current generation) are selected based on their desirability in order to generate new population (new generation). The offsprings are generated using evolutionary operators such as Selection, Crossover, and Mutation. When the new generation is generated, the algorithm should check the termination condition. If the termination criterion is satisfied, the algorithm terminates; otherwise, this cycle repeats itself till the termination condition is met.

4 Network and energy consumption model

Consider one BS and the set of sensor nodes which are defined as follows:

$$\begin{aligned} S=[S_1,S_2,S_3,\ldots ,S_n]. \end{aligned}$$
(2)

In Eq. 2, n indicates the number of nodes which are distributed in a geographic area. Our goal is to select a collection of CHs which cover the entire area. Each sensor node is shown by \(S_i\), where \(1\le i \le n\). Also, we define the set of CHs as follows:

$$\begin{aligned} C=[C_1,C_2,C_3,\ldots ,C_m]. \end{aligned}$$
(3)

In Eq. 3, m indicates the number of clusters, where \(n \ge m\), it means that the number of sensor nodes is always greater than the number of CHs.

In the model of studied network, the following properties are assumed:

  • The sensor nodes are placed randomly and independently in a given environment and are homogeneous (in terms of computational and processing power, energy and memory).

  • Since the CGC protocol is a centralized method, sensor nodes are not engaged in CHs selection and cluster formation processes. Moreover, all the clustering procedures are performed by BS and its results are transmitted to the whole network. Therefore, BS is not limited in terms of energy, memory, and computational power.

  • The network structure consists of a BS and a number of sensor nodes that communicate with each other.

  • The network is divided into a number of clusters. Each cluster consists of several sensor nodes, each managed by its own CH. When CHs receive the messages from their members, they relay them to the BS. Each sensor node periodically senses a geographical area and sending the obtained information to the BS via its CH.

  • The initial energy of all nodes is the same and is defined by \(E_{primary}\). It should be noted that \(E_{primary}=E_{max}\).

Figure 1 illustrates a network which is divided into a number of clusters and each cluster consists of several sensor nodes. The WSN nodes consist of several modules including: Sensor Module, Processing Module, Wireless Communication Module and Power Supply Module. These modules work together in order to build sensing operation in a WSN environment. Thus, in order to evaluate the energy consumption of a WSN node, it is important to study the energy consumption of its modules.

The energy consumption of sensor module is due to many factors like signal sampling, AD (Analogue to Digital) signal conversion and signal modulation. Also the energy consumption of this module is related to the sensing operation of the node (periodic, sleep/wake, etc). The energy consumption in periodic mode is obtained as follows [2527]:

$$\begin{aligned} E_{sensor} = E_{on-off} + E_{off-on} + E_{sensor-run}. \end{aligned}$$
(4)

In Eq. 4, \(E_{on-off}\) is the energy consumption of closing sensor operation, \(E_{off-on}\) is the energy consumption of opening sensor operation and \(E_{sensor-run}\) is the energy consumption of sensing operation.

The energy consumption of processing module is obtained as follows:

$$\begin{aligned} E_{cpu} = E_{cpu-state} + E_{cpu-change}. \end{aligned}$$
(5)

Processing module supports three operation states: sleep, idle and run. \(E_{cpu-state}\) is the state energy consumption and \(E_{cpu-change}\) is the state transition energy consumption.

The energy consumption of the nodes consists of the consumption energy for sending and receiving the messages. Like [12], energy is consumed on data sending/receiving for a sensor node can be calculated as follows:

$$\begin{aligned} E_{tx}(l,d)=E_{elec}\times l + \left\{ \begin{array}{cl} \varepsilon _{mp}\times l \times d^2, d\le d_{0}\\ \varepsilon _{mp}\times l \times d^4, d > d_{0}. \\ \end{array} \right. \end{aligned}$$
(6)

In Eq. 6, \(E_{tx}(l,d)\) is the consumed energy when transmitting a l-bit message through the distance d. \(E_{elec}\) is the electronic energy consumed per bit for coding, modulation, filtering and spreading. Also the distance threshold (\(d_0\)) is calculated as follows:

$$\begin{aligned} d_0 = \sqrt{ \frac{\varepsilon _{fs}}{\varepsilon _{mp}} }. \end{aligned}$$
(7)

In Eq. 7, \(\varepsilon _{fs}\) represents the amplifier parameter in a free space model when the transmission distance is shorter than \(d_0\) and \(\varepsilon _{mp}\) represents the amplifier parameter in a multi-path fading channel model when the transmission distance is longer than \(d_0\).

Also the energy consumption on receiving data is calculated as follows:

$$\begin{aligned} E_{rx}(l) = E_{elec}\times l. \end{aligned}$$
(8)

Finally, the total energy consumption for transmitting a l-bit message from a source node S to a destination node D through the distance d is obtained as follows:

$$\begin{aligned} E_{S,D}(l,d) = E_{tx}(l,d)+ E_{rx}(l). \end{aligned}$$
(9)

In Eq. 9, \(E_{tx}(l,d)\) is energy consumption for transmitting a l-bit message through distance d, and \(E_{rx}(l)\) is energy consumption for receiving a l-bit message.

5 The CGC protocol based on genetic algorithm

In this paper, GA is utilized to explore in a complicated search space and conduct desired optimization. The output is a chromosome consisting of the optimum CHs with respect to parameters which are defined in the following.

The operation of the CGC protocol is divided into rounds. As shown in Fig. 3, a certain period of time is defined as a round, where each round begins with a setup phase, when the BS finds the optimum number of CHs and assigns members nodes of each CH, followed by a steady state phase, when the sensed data are transferred to CHs and collected in frames; then these frames are transferred to the BS. Also, each round consists of two phases [28]:

Fig. 3
figure 3

Working cycle of the CGC protocol

The setup phase This phase consists of CHs selection and cluster formation, respectively. During each setup phase, the BS receives information on the current energy status, location, and number of the neighbors from all the nodes in the network. Based on these information, the BS selects proper CHs using an evolutionary approach (GA). After the appropriate chromosome is selected by BS (selecting the appropriate chromosome, will be explained), IDs of these CHs introduced to the whole network as proper CHs by BS as follows:

$$\begin{aligned} \text {ADV-Member} = [\text {IDs of proper CHs}]. \end{aligned}$$
(10)

where, ADV-Member is advertisement message which is diffused to the whole network. Moreover, the IDs of members are sent to the CHs by and ADV-CH message as follows:

$$\begin{aligned} \text {ADV-CH} = [\text {Node's ID, CH's ID}]. \end{aligned}$$
(11)

Then, the cluster is established. The operation of setup phase is shown by Fig. 4.

Fig. 4
figure 4

The operation of setup phase in the CGC protocol

After all, each CH creates the time division multiple access (TDMA) schedule by assigning slots to its member nodes and informs these nodes by this schedule. The TDMA schedule is used to avoid intra-cluster collisions and reduce energy consumption between data messages in the cluster. Also, to reduce inter-cluster interference, every CH selects a unique CDMA code and informs all member nodes within the cluster to transmit their data using this spreading code.

The steady state phase In steady state phase, during each frame, every member node at the time of its respective time slot, sends sensed data to its CH (like [13]). Then, every CH forwards the aggregated data to the BS.

5.1 Fitness parameters for determining optimum CHs in CGC protocol

As it is mentioned before, clustering is performed by the BS and the results are transmitted to all nodes. The number of needed CHs indicates the length of a chromosome. In the CGC protocol, a node is suitable for being CH if it acquires the undergoing conditions [28].

A. The residual energy of the nominated node must be higher than the average energy of the whole network. For this purpose, we define function E(avr) as the function to calculate average energy of the whole network and it can be defined by Eq. 12 as follows:

$$\begin{aligned} E(avr) = \frac{E_{S_1}+E_{S_2}+\cdots +E_{S_n}}{n}=\frac{\sum _{i=1}^{n} E_{S_i}}{n}. \end{aligned}$$
(12)

Also, function \(E(res)_{S_i}\) is defined to represent the residual energy of node \(S_i\). In other words, it means:

$$\begin{aligned} E(opt) = \left\{ \begin{array}{l l} E(res)_{S_i} &{} \quad if \; E(res)_{S_i} \ge E(avr) \\ \text {Go to the next } E(res)_{S_i} &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
(13)

In Eq. 13, where, E(opt) represents optimal energy, \(E(res)_{S_i}\) and E(avr) represent the residual energy of the sensor node \(S_i\) and the average energy of the whole network, respectively.

B. To address this issue, the network with n nodes is mapped into a graph. This graph is formed among the nodes which are in the radio communication range of each other. Then, a weight is assigned to each edge of this graph. Graph G is defined as \(G = \{V, E(w)\}\). Where, V is the number of vertices (the number of nodes), E is the number of edges, and w is a function of E to the set of real and positive numbers. V is the set of sensor nodes and E is the set of transmission edges. The weight of such edge \((S_i,S_j)\) is represented by \(w(S_i,S_j)\). The longitude of a path in graph G is defined as the total weight of the edges that make up the route. Now, the criterion for determining and allocating weights to the edges of a graph G, will be described. The weight of such edge \((S_i,S_j)\) (which is shown by Fig. 5) is calculated as follows:

$$\begin{aligned} w(S_i,S_j) = \frac{RE_{S_i}}{D_{S_i,S_j}}. \end{aligned}$$
(14)

In Eq. 14, \(RE_{S_i}\) and \(D_{S_i,S_j}\) represent the residual energy of node \(S_i\) and the distance between nodes \(S_i\) and \(S_j\), respectively (we have considered estimation in [29] by using the log-normal shadowing radio propagation model (LNSM). On a basis of signal strength of received frames a distance between two sensor nodes is estimated).

Fig. 5
figure 5

The weight of edge \((S_i,S_j)\)

In this case, the sum of weights for edges (SWE) connected to each node must be calculated. We define function \(SWE(S_i)\) as the function to calculate the sum of weights for edges connected to each node. \(SWE(S_i)\) should be maximized for a node so that it could be considered as a proper candidate for being CH. \(SWE(S_i)\) is calculated by Eq. 15 as follows:

$$\begin{aligned} SWE(S_i) = w(S_i,S_j)+\cdots +w(S_i,S_h)= \sum \limits _{d=1}^{j,h} w(S_i,S_d). \end{aligned}$$
(15)

Obviously, the higher value of the edge means that data transmission is more beneficial using that edge and needs less energy.

Fig. 6
figure 6

An example of a graph and constructed edges between its nodes

Figure 6 shows a graph which is formed among the sensor nodes (according to the radio communication range and neighborhood relations). Also, the weights are allocated to each edge and they are displayed by \(w(S_i,S_j)\). Where, \(S_i\) is the source node and \(S_j\) is the destination node. It is important to say that \(w(S_i,S_j) \ne w(S_j,S_i)\). It means \(w(S_i,S_j)\) and \(w(S_j,S_i)\) are different from each other. In this section, we aim to explain how it is possible to calculate SWE. The weights of all the edges are shown in Fig. 6. Also, these weights are presented in Table 1.

Table 1 Weight values of the edges

It should be noted that all the values have been normalized between [0,1]. SWE for each node can be calculated as follows:

$$\begin{aligned} \begin{aligned} SWE(S_1)&= w(S_1,S_2)+w(S_1,S_4)= 0.8 \\ SWE(S_2)&= w(S_2,S_1)+w(S_2,S_3)\\&\quad + w(S_2,S_4)+w(S_2,S_5)= 2.6 \\ SWE(S_3)&= w(S_3,S_2)+w(S_3,S_5)= 0.8 \\ SWE(S_4)&= w(S_4,S_1)+w(S_4,S_2)\\&\quad +w(S_4,S_6)+w(S_4,S_7)=3.2\\ SWE(S_5)&= w(S_5,S_2)+w(S_5,S_3)\\&\quad +w(S_5,S_7)+ w(S_5,S_8)= 2.8\\ SWE(S_6)&= w(S_6,S_4)+w(S_6,S_7)= 0.5\\ SWE(S_7)&= w(S_7,S_4)+w(S_7,S_5)\\&\quad +w(S_7,S_6)+w(S_7,S_8)= 1.8\\ SWE(S_8)&= w(S_8,S_5)+w(S_8,S_7)= 1.0.\\ \end{aligned} \end{aligned}$$
(16)

For example, if we assume that the number of needed CHs is 2, according to the results of SWE values, it can be argued that nodes \(S_4\) and \(S_5\) are more appropriate candidates in terms of SWE, respectively. Therefore we have:

$$\begin{aligned} \text {Maximum} \; SWE = \left\{ \begin{array}{l l} SWE(S_4)=3.2 \\ SWE(S_5)=2.8. \end{array} \right. \end{aligned}$$
(17)

SWE is used as one of the criteria in order to select the optimal CH. Since it makes an affordable transmission cost between members and the CHs. It also leads to creation of more appropriate centralized clusters.

C. The node should not be an outlier. We define function \(O(S_i)\) as the function to indicate whether the node is outlier or not. \(O(S_i)\) is defined by Eq. 18 as follows:

$$\begin{aligned} O(S_i) = \left\{ \begin{array}{ll} 1 &{} \quad \text {if} \; S_i\;\text { is an outlier node} \\ \\ 0 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
(18)

If the BS faces a node which is located in the blind spot, ignores it while selecting the CH. We define the set of nodes which are in the blind spot by Eq. 19 as follows:

$$\begin{aligned} B=[b_1,b_2,\ldots ,b_w]. \end{aligned}$$
(19)

Also, in order to find a node which is in the blind spot or not, we can define Eq. 20:

$$\begin{aligned} BSF_{S_i}=\frac{NON_{S_i}}{n}. \end{aligned}$$
(20)

where, \(BSF_{S_i}\) is blind spot finder which is defined by dividing the number of neighbors (\(NON_{S_i}\)) a node has to the total number of nodes (n). A node is outlier and in the blind spot if \(BSF_{S_i}\) value for it is less than a threshold level determined by the designer. Therefore we call the region where discussed node is located, the blind spot. Furthermore, we can write Eq. 21 as follows:

$$\begin{aligned} \left\{ \begin{array}{l l} S_i \; \text {is outlier} &{} \quad if \; BSF_{S_i} \le \text {threshold} \\ \\ S_i \; \text {is not outlier} &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
(21)

This criterion is used because the nodes which have covered a large number of other nodes, have more chance for becoming CH. Thus, the amount of surface area covered by CHs, increases. Which makes the network more stable.

It can be said from the BS point of view, the node which satisfies all three parameters, is selected as appropriate CH. Algorithm 1 describes the mentioned steps.

figure e

5.2 The initialization in CGC protocol

The structure of each chromosome in CGC protocol is illustrated by Fig. 7.

Fig. 7
figure 7

The structure of chromosomes in CGC protocol

Assumptions: Each chromosome is denoted in the form of \(\{g_1,g_2,g_3,\ldots ,g_x\}\) where, \(x=1,2,3,\ldots ,L\). Furthermore, n represents the number of nodes in the network. L denotes the length of chromosomes which also shows the number of needed CHs. \(g_x\) is \(x{\mathrm {th}}\) gene which includes characteristics of desired CH together with its associated ID. \(g_x\) is carrier of the following characteristics:

  1. (1)

    Node’s ID (from 1 to n)

  2. (2)

    The residual energy of node \(S_i\) which is shown by \(E(res)_{S_i}\).

  3. (3)

    The value of \(SWE(S_i)\)

  4. (4)

    \(O(S_i)\) (outlier nodes and non-outlier nodes are shown by 1 and 0, respectively.)

Hence, \(g_x\) might be written as follows:

$$\begin{aligned} g_x= ID. \left\{ \begin{array}{l@{\quad }l} E(res)_{S_i} &{} which \; E(res)_{S_i} > E(avr) \\ SWE(S_i) &{} {which} \; \text {is maximum between other }SWE \hbox {s} \\ O(S_i) &{} which \; O(S_i)=0. \end{array} \right. \end{aligned}$$
(22)

For initialization and generating initial population, random selection might be utilized.

5.3 Fitness function in CGC protocol

Now, the current population must be evaluated to determine survivors. For this purpose, a fitness function is exploited. Fitness function is an objective function which is tried to be minimized or maximized. Since routing is a two level procedure (first level between nodes and CHs and second level between CHs and BS), it must be considered while dealing with fitness function.

The proposed fitness function (F) is a function of all three parameters described above and is defined as follows:

$$\begin{aligned} \begin{aligned} F =&\sum _z Min \big |(EC_{z-1}-{EC_z}) + SWE_z + BSF_z\big | \\&+ \sum _z Min \big |(EN_{z-1}-{EN_z}) + WE_z\big |. \end{aligned} \end{aligned}$$
(23)

In Eq. 23, \(EC_{z}\) represents average energy of cluster in current round (round z) and \(EC_{z-1}\) represents average energy of cluster in previous round (round \(z-1\)). \(EN_z\) and \(EN_{z-1}\) are average of total network energy in current and previous rounds, respectively. Since the average energy of cluster and average energy of network in round z are always less than or equal to these values in round \(z-1\), it is desired that the difference between them is minimized. Also, \(SWE_z\) represents the sum of weights for edges connected to the CH. \(BSF_z\) describes the CH outlier or non-outlier property and \(WE_z\) represents the weight of the edge which is connected between the CH and BS.

5.4 The genetic operators in CGC protocol

In this section, the genetic operators which are used in the proposed protocol, are explained. These operators are: Selection, Crossover, and Mutation.

In the individual selection, some of current chromosomes are selected to generate new population with respect to their desirability. They may be selected randomly because a good offspring may result from combination (crossover) of a good and a bad parent or even two bad parents. It means that random selection will not be problematic in the future. Nevertheless, in CGC protocol the Roulette Wheel [21] model is utilized. The main idea is: 1- chromosomes with more competence, have greater chance of being selected. 2- chance of selection is proportional to the fitness of chromosomes. In roulette wheel selection, the probability that individual i will be selected, is obtained as follows:

$$\begin{aligned} p_i=\frac{Fitness_i}{\sum _{q=1}^{s} Fitness_q}. \end{aligned}$$
(24)

In Eq. 24, \(p_i\) is the probability that chromosome i will be selected, \(Fitness_i\) represents the fitness function value of chromosome i, \(\sum _{q=1}^{s} Fitness_q\) represents sum of all fitnesses of the chromosome with the population, and s represents the size of population. The steps of the individual selection using roulette wheel method is shown by Algorithm 2.

figure f

Random number r is generated (where, \(0 \le r < 1\)). Sum, calculates sum of all chromosomes fitnesses in population. Loop goes through the population and adds fitnesses from 0 to Sum. When the Sum is greater than r, stops and returns the chromosome where you are.

When the parent chromosomes are selected, the crossover operator is applied to parent chromosomes with the probability of \(P_c\). This operator combines parents and generates new chromosomes (offsprings). During crossover operation, new information is usually extracted from existing information in current chromosomes (chromosomes existing in parent population). In CGC protocol, one-point crossover [21] is exploited. One-point crossover operator breaks two chromosomes from a random point and replaces broken parts of these chromosomes. In other words, in one-point crossover one crossover location \(l[1,2,\ldots ,m_{v-1}]\) is chosen at random. \(m_v\) is the number of variables of an individual (chromosome). Then, the variables are exchanged between the individuals from this point and two new offsprings are produced. Figure 8 depicts the generation of offspring chromosomes from parents chromosomes using one-point crossover operator in CGC protocol.

Fig. 8
figure 8

Generating offsprings by applying one-point crossover operator

Consider two parents [101011001011] and [000110101001]. If we assume that the chosen crossover location is \(l=3\), after one-point crossover, the new individuals are generated in forms of \(\text {Offspring1} = [101|110101001]\) and \(\text {Offspring2} = [000|011001011]\). After two new chromosomes are generated, the initial chromosomes and generated ones are called parents and offsprings, respectively.

Mutation [21] occurs while genetic information is transferred between parents and offsprings. Mutation operator is applied to each chromosome resulted from crossover operator (offspring chromosome). For each bit of chromosome, a random number is generated. If the value of random number is less than mutation probability \(P_m\), mutation occurs in that bit; otherwise, mutation will not happen. In CGC protocol a proper node which satisfies the criteria for becoming a CH (according to the criteria defined earlier) is selected randomly. If this node is not among the nodes inside the chromosome, the mutation occurs and it will be located inside one of the cells of mentioned chromosome.

5.5 Algorithm termination in CGC protocol

In this step, all members inside new population are evaluated. If the requirements are met, the algorithm terminates. Otherwise, the new population is utilized as initial population of next round and all mentioned steps are repeated. Termination criteria in GA might be different. For example, algorithm execution time, limited number of iterations or lack of change in optimum answer after specific number of iterations might be examples criteria for algorithm termination. In the CGC protocol the termination criterion is lack of change in optimum solution after a specific number of iterations. Pseudo-code of all steps in the CGC protocol is presented by Algorithm  3.

figure g

6 Upper level routing using onion approach

A WSN with a large number of nodes is called a large-scale WSN. Energy consumption management in such a scale is one of the most important issues. It should be noted that the strategies which are used to manage communications among nodes in small scale WSNs, are not efficient in large scale WSNs. In order to use the CGC protocol in large scale WSNs for upper level routing (between CHs), this protocol is expanded by a new concept which is called “Onion Approach”. After determining optimal CHs and the cluster formation through a process by the CGC protocol, onion approach will be realized. A layered network (like the layers of an onion) is formed and the CHs are placed in these layers. Figure 9 is assumed. In this figure, after the optimal CHs are selected and the clusters are formed, onion layering operation is performed. Each of the blue squares indicates a CH node. Onion approach reduces communication overhead between CHs by dividing the network into several onion layers.

Fig. 9
figure 9

How the CHs are placed in different onion layers

6.1 Onion layering operations

6.1.1 Primary assumptions

Assume a network with m CH nodes. Onion layers are a combination of the CHs and can be defined in form of \((l_1,l_2,\ldots ,l_k)\), where, k denotes the number of layers and \(1 \le k \le o\).

In the proposed onion layering method, after the selection of optimal CHs and cluster formation, each CH node which is placed in one of the onion layers, just knows the CHs which are placed in the next and previous layers and communicates with them. In large-scale networks, as the number of nodes increases dramatically, the communication overhead between them is also increased. Thus, providing a mechanism to reduce communication overhead between the CH nodes is crucial. Onion approach, by dividing the network into separate layers, leads to reduction in communication overhead between the CH nodes. Since, if a CH node wants to communicate with others, does not deal with a huge number of CH nodes and just deals with CHs which are placed in the next and previous layers. Also, it is sufficient for each CH node to identify the CH nodes of its neighboring layers and there is no need to identify the state of every CH node of every layer in the network. Assume layer \(l_k\). We assume that this layer includes m CH nodes. In this case, these nodes only identify the CH nodes in layers \(l_{k-1}\) and \(l_{k+1}\), respectively.

6.1.2 Onion layering

Assume that the procedures of selecting optimal CHs and forming clusters through an evolutionary process by the CGC protocol are already done. Now, we intend to do the layering operation on the clustered network using the onion approach. Assume the radio communication radius of each CH node is \(R_c\). If the CH nodes in each layer of the onion want to make a single-hop communication with the CH nodes in the next and previous layers, the radius of different onion layers is determined by the following proposed equation:

$$\begin{aligned} R_{l_k}= \left\{ \begin{array}{l l l} \frac{R_c}{3} &{} \quad if \; k=0 \\ \\ \frac{R_c}{3}+ \left( k \times \frac{R_c}{3}\right) + (k-1) \times 0.5 \frac{R_c}{3} &{} \quad \text {otherwise}. \end{array} \right. \end{aligned}$$
(25)

In Eq. 25, \(R_{l_0}\) is the radius of the \(0{\mathrm {th}}\) layer (the innermost layer) and \(R_{l_k}\) is the radius of the \(k{\mathrm {th}}\) layer. By using Eq. 25, the onion layers are created such that the CH nodes in each layer can communicate with the CH nodes of their neighboring layers in a single-hop way. Consider Fig. 10. In this figure, each CH node, in its worst state (at the edges of the circles), has a distance of \(R_c\) with the CH nodes in each of its neighboring layers (i.e. if two CHs want to stay connected, the maximum distance should be \(R_c\)). For example, the distance between nodes \(c_1\) and \(c_4\) is \(R_c\) (at its worst case). Therefore, in Eq. 25, \(R_c\) must always be divided by three so that every CH node in each layer has the CH nodes in the next and previous layers (neighboring layers) within its radio communication radius.

Fig. 10
figure 10

Neighboring layers and the distance between them

Assume that the radio communication radius of each node is 30 m (\(R_c=30\)). Thus, the radius of the innermost layer of the onion (layer \(l_0\)) is 10 m (\(R_{l_0}=10\)). Furthermore, the radius of the next layers are calculated using Eq. 25. The radius of layer \(l_1\) is:

$$\begin{aligned} R_{l_1}= & {} \frac{R_c}{3}+\left( 1 \times \frac{R_c}{3}\right) + (1-1) \times 0.5 \frac{R_c}{3}\nonumber \\= & {} 10+10+0=20. \end{aligned}$$
(26)

Also, the radius of layers \(l_2\) and \(l_3\) are calculated using the same approach by Eqs. 27 and 28, as follows:

$$\begin{aligned} R_{l_2}= & {} \frac{R_c}{3}+\left( 2 \times \frac{R_c}{3}\right) + (2-1) \times 0.5 \frac{R_c}{3}\nonumber \\= & {} 10+20+5=35. \end{aligned}$$
(27)
$$\begin{aligned} R_{l_3}= & {} \frac{R_c}{3}+\left( 3 \times \frac{R_c}{3}\right) + (3-1) \times 0.5 \frac{R_c}{3}\nonumber \\= & {} 10+30+10=50. \end{aligned}$$
(28)

Figure 11 shows the created layers of the onion. As it can be seen, the layers are created such that every CH node in each layer can cover the CH nodes in its neighboring layers. This procedure continues till all nodes on the network are covered by different layers. If a CH node wants to communicate with another CH node in its neighboring layers and if there are not any CHs in its neighboring layers, these layers will be considered as null layers and the first layers after them will be assumed as the new neighboring layers.

Fig. 11
figure 11

The layers creation of the onion using proposed equation

6.1.3 Routing between onion layers

For performing routing procedure between CHs (onion layers), the network is divided into two “northern” and “southern” parts. It should be mentioned that BS knows the coordinate of the network (in our simulations it is \(200\times 200\, \mathrm {m^2}\)) and according to this coordinate, determines which node must be placed either in northern part or southern part of the network and then, informs each node from its situation. As a result, each node will be aware of its place either in northern or southern part. In addition, we used [3] as a method of communication between CHs in order to select next relay node in routing procedure.

In order to make an intelligent decision for dividing the network into two separate parts (northern and southern), we intend to propose a practical solution which can be exploited in any topology. For example, in some applications of WSNs, BS has been situated among the sensor nodes. On the other hand, in other applications, it has been placed outside the network area (maybe in north, west, south or east). For this reason, it is an essential need to propose a novel solution which can be used irrespective of BS’s location. Since our method is a centralized protocol, BS knows the size of the network. Assume \(y_{network}\) as the width of the network. By dividing \(y_{network}\) to 2, BS is able to apportion the network into two sections. Furthermore, BS needs to notify each sensor node from its situation and this process can be done in set-up phase and using following equation:

$$\begin{aligned} \text {Position}_{S_i}= \left\{ \begin{array}{lll} \text {Put }S_i\text { in northern part} &{} \quad if \; y_{S_{i}}<= \frac{y_{network}}{2} \\ \\ \text {Put }S_i \text { in southern part} &{} \quad \text {otherwise}. \end{array} \right. \end{aligned}$$
(29)

According to Eq. 29, \(y_{S_i}\) indicates the y component of node \(S_i\) and \(y_{network}\) represents the width of the network. It is evident that if \(y_{S_{i}}<=y_{network}/2\), then the node is considered as a node which has been situated in northern part and otherwise it is assumed as a node which is placed in southern part.

Fig. 12
figure 12

An example of routing procedure between the CHs in northern and southern parts

The northern and southern parts of the network are shown with N and S, respectively. Consider node C as a CH node. Assume that this node is in part N and wants to send packet P to BS. Packet P must traverse the layers in ascending order to reach the last layer. After packet P reaches the last layer, as there is no other layer after its current layer, the packet is directly sent to BS.

Now, assume that node C is placed in part S and wants to send packet P to BS. When node C is in part S, the routing process is done in reverse i.e. packet P must go through the routing process in descending order. This process continues until the packet reaches a CH node in part N. After the arrival of the packet in part N, the routing process will continue in ascending order as described before. Dividing the network into southern and northern parts and performing ascending and descending routing, allow a CH node (which is in part S and also its radio communication range does not allow transferring the packet directly to BS) to send its data using the intermediate layers of the onion. For example, Fig. 12 is considered. In order to perform the routing process, each CH node only knows the CH nodes in its neighboring layers. According to Fig. 12, assume that node \(c_1\) wants to send a packet to BS from the innermost layer of the onion (layer \(l_0\)). This node can send the packet to node \(c_2\) or \(c_3\). This process is repeated by node \(c_2\) or \(c_3\) until the packet finally arrives at node \(c_6\). After the packet reaches node \(c_6\) which is located in layer \(l_3\), because node \(c_6\) does not see any other layer after its own, realizes that the next destination is BS and sends the packet to it. The important note here is that the routing process is done in ascending order i.e. from layer \(l_0\) to \(l_3\).

Now, assume that node \(c_7\) wants to send a packet to BS. According to the previous discussion, because \(c_7\) is in the last layer (\(l_3\)) and sees no other layer after its own, it must send the packet directly to BS. But this is not possible. For this purpose, if a node is in the southern part of the network (like node \(c_7\)) and wants to send a packet to BS, the routing operation is done in reverse (descending order) i.e. the packet is sent to layer \(l_2\), then \(l_1\) and finally \(l_0\). When the packet arrives at layer \(l_0\), the routing process is done in ascending order, like what was discussed for node \(c_1\). The steps of onion layering operation are shown by Algorithm 4.

figure h

7 Simulation results and analysis

In this section, we intend to analyze the CGC protocol in terms of time complexity and performance evaluation.

7.1 Time complexity

Time complexity analysis can be used to predict the growth behavior of an algorithm and is useful for analyzing and optimizing the real time efficiency of the algorithm. In this paper, we show the big Omicron (big-O) [30] with a set by Eq. 30 as follows:

$$\begin{aligned} O(f(n))=\{g \exists c>0,\exists n_0>0,\forall n\ge n_0:0 \le g \le cf(n)\} \end{aligned}$$
(30)

In Eq. 30, \(g \in (f(n))\) if and only if there exist positive constants c and \(n_0\) such that for all \(n \ge n_0\), the inequality \(0 \le g \le cf(n)\) is satisfied. We say that g is big-O of f(n) if f(n) is an approximate upper bound for g. In terms of time complexity analysis, we use the term \(T(n) \in O(f(n))\) and say that the algorithm has order of f(n) complexity. This means that the time taken to compute a problem of size n is in the set of functions described by O(f(n)).

We show the population size in our method by a positive integer \(L_0\) which also shows the number of needed CHs. For a worst case scenario, we assume that the whole population is selected in the selection process. This means that \(L_0\) number of solutions is considered for the crossover operation. A fraction of the population is selected for the crossover process, which produces additional solutions to be added to the population. As we mentioned before, \(P_c\) and \(P_m\) are crossover and mutation probabilities where \(0 \le P_c \le 1\) and \(0 \le P_m \le 1\); the number of additional solutions would be \(L_0P_c\), making the total number of solutions in the population so far to be \(L_0 + L_0r_c\). Then, a portion of this population is selected for the mutation process. The number of solutions added to the population is \((L_0 + L_0P_c) P_m\). If we assume that N is the number of sensor nodes, then, we can write Eq. 31 as follows:

$$\begin{aligned} T(n)=L_0 N(1+ P_c + P_m+ P_cP_m). \end{aligned}$$
(31)

As a result, the running time of the proposed method has been compared with LEACH-SAGA [20]. LEACH-SAGA is chosen as a method for our comparison because it has better efficiency in comparison with other methods which are compared with the proposed method in this paper. The results have been evaluated with respect to the number of nodes in the network. The evaluation has been done with a group of 25, 50, 75, 100, 125, 150, 175 and 200 nodes. Then, the running time of each method has been shown according to the number of nodes.

Fig. 13
figure 13

Comparison of running time of the proposed method and LEACH-SAGA

As it is evident from Fig. 13, the running time of the CGC protocol grows with a gentle slope. On the other hand, the running time of LEACH-SAGA grows with a steep slope. It can be argued that the proposed method is superior to LEACH-SAGA in terms of running time. As a manner of fact, CGC protocol by utilizing GA as a method for CH selection and according to their distances to other sensor nodes, requires less computation time than LEACH-SAGA.

7.2 Performance evaluation

For the sake of comparison, we compared our method with four existent methods: LEACH [12] (a distributed clustering method), LEACH-SAGA [20] (a centralized clustering method based on simulated annealing and GA), LEACH-C [13] (a centralized clustering method), and GA-based [31] (a distributed clustering method based on GA). The performance of the CGC protocol is compared with these four methods in terms of the number of dead nodes, first node death, last node death, data packets received by the BS, and energy consumption of the network. Since LEACH and LEACH-C are two distributed and centralized clustering methods respectively, they can be a good choice to compare our method with two kinds of traditional clustering protocols. On the other hand, LEACH-SAGA and GA-based are two centralized and distributed genetic-based clustering methods respectively. We find it rational to compare our method with two meta-heuristic (genetic-based) methods so that one of them is centralized and the latter is distributed.

The number of nodes in the simulated network is 200. Moreover, according to the literature [32], we get the optimal number of needed CHs is 10. In our scenario, the onion layering has been done and the radio communication radius of sensor nodes is 30 meters. If the number of neighboring nodes for a node is less than five percent of total number of nodes, it is considered as an outlier node. Simulated network has the sensing field of \(200\times 200\,\mathrm {m^2}\) area. The simulation and genetic parameters are presented in Table 2. The number of iterations for termination condition of GA is 1000. We assume that all nodes have no mobility since the nodes are fixed in the most applications of WSNs.

Table 2 Simulation parameters
Fig. 14
figure 14

Number of dead nodes in different rounds of the network

In Fig. 14 the performance of the LEACH, LEACH-C, LEACH-SAGA, GA-based, and the CGC protocol has been evaluated in terms of the number of dead nodes. In our simulation, the number of nodes is 200. As it is evident from Fig. 14, with the increase of rounds, the time of the first node dead of the CGC protocol is later than LEACH, LEACH-C, LEACH-SAGA, and GA-based protocols. For example, in LEACH after round 1050 the number of dead nodes increases to 197, in LEACH-C increases to 121, in LEACH-SAGA increases to 59, in GA-based increases to 102; whereas, in CGC the number of dead nodes increases to 40. Since the CGC protocol always tries to select a node as CH which satisfies aforementioned three criteria, the energy consumption is distributed between nodes. As a result, no node will be exploited more than its capability which in turn leads to optimization of the energy consumption in the whole network. Simulation results revealed that in a network which uses LEACH, LEACH-C, LEACH-SAGA, and GA-based protocols, there would not be any alive node after rounds 1065, 1091, 1157, and 1105 respectively. By contrast, a network employing the CGC protocol still has 15 alive nodes after round 1157. In other words, when all nodes of a network using LEACH, LEACH-C, LEACH-SAGA, and GA-based are dead, almost 15 nodes in the network using the CGC protocol are still alive. Our simulations revealed that these 15 nodes are very close to the BS and they are able to establish a single-hop communication with BS. Figure 15 shows this expression. These 15 alive nodes have been shown by blue circles. Moreover, dead nodes have been shown by red circles. As can be seen, these 15 alive nodes are in a single-hope distance to the BS. It reveals that CGC method has the ability to exploit the whole network’s potential in terms of number of alive nodes.

Fig. 15
figure 15

The capability of CGC protocol to utilize network’s potential in terms of number of alive nodes

In Figs. 16 and 17, the performance of LEACH, LEACH-C, LEACH-SAGA, GA-based, and CGC protocol has been measured in terms of first node death (FND) and last node death (LND). Simulation results show that the CGC protocol is superior to LEACH, LEACH-C, LEACH-SAGA, and GA-based in terms of FND and LND. As it can be seen in Fig. 16, the first dead node of LEACH, LEACH-C, LEACH-SAGA, GA-based, and CGC is appeared in rounds 698, 731, 996, 810, 1005 respectively. Also, it is obvious from Fig. 17 that the last dead node of LEACH, LEACH-C, LEACH-SAGA, and GA-based is appeared in rounds 1065, 1091, 1157, and 1105 respectively. On the other hand, the last dead node in CGC is appeared in round 1200.

Fig. 16
figure 16

Comparison in terms of first node death

Fig. 17
figure 17

Comparison in terms of last node death

In Fig. 18 the performance of the LEACH, LEACH-C, LEACH-SAGA, GA-based and the CGC protocol is evaluated in terms of the number of packets received by the BS. The amount of data packets received within a certain time period is an important index for measuring the quality of network service. As it is evident from Fig. 18, the simulation results clearly show that the CGC protocol is better than the LEACH, LEACH-C, LEACH-SAGA, and GA-based clustering protocols in terms of the number of packets received by the BS. This is because we use the CGC protocol to select the CHs and form the clusters. It can be argued that if the more nodes are near to CH, the time of transmission data is less.

Fig. 18
figure 18

Comparison in terms of total packets received by the BS

In Fig. 19, the performance of LEACH, LEACH-C, LEACH-SAGA, GA-based and the CGC protocol is evaluated in terms of energy consumption. It can be seen that energy consumption is 88J for LEACH in round 850, for LEACH-C in round 903, for LEACH-SAGA in round 948, for GA-based in round 933 and for the CGC protocol in round 1003. The energy consumption is 98J for LEACH in round 953, for LEACH-C in round 1010, for LEACH-SAGA in round 1047, for GA-based in round 1096 and for the CGC protocol in round 1150. Simulation results show that the CGC protocol has significant abatement in terms of energy consumption. Since the CGC protocol tries to found out the nodes with higher value of SWE, as well as residual energy, energy consumption will be more balanced. Furthermore, onion approach by dividing the network into separate layers, reduces energy consumption and communication overhead between nodes.

Fig. 19
figure 19

Comparison in terms of energy consumption in different rounds of the network

In order to show the importance of proposed onion approach and its significant effect on energy consumption, we measured the energy consumption of the whole network before and after the assigning of onion approach. Table 3 describes the values of energy consumption of the whole network in different rounds of the network. We showed energy consumption by EC (in J) and onion layering by OL. As can be seen, onion layering leads to reduction in terms of energy consumption in each round. It is evident from Table 3 that this concept plays an important role in the network and can reduce the energy consumption of the network considerably.

Table 3 The energy consumption of the whole network before and after onion layering in different rounds

8 Conclusion

In this paper, we proposed a centralized genetic-based clustering (CGC) protocol using onion approach. The CHs are selected with respect to three criteria and based on GA. The GA is used to search in a complicated search space and select optimum CHs. Furthermore, an innovative fitness function has been proposed and each chromosome which minimizes this function is selected by BS and its nodes are introduced to the whole network as proper CHs. In fact, the proposed fitness function leads to appropriate chromosome selection. Moreover, onion approach as a new concept was introduced. By this approach, the network is divided into several layers. The CHs are placed in separate onion layers and routing is done between them through these layers. Onion approach by increasing the number of packets received by the BS, makes the CGC protocol suitable for large-scale WSNs. The simulation results clearly show that the CGC protocol considerably improves the network lifetime and reduces energy consumption of the network and it is able to significantly improve packet delivery and maintain nodes alive for longer time. At the same time, we showed that the running time of the proposed method is much better than previous methods.