1 Introduction

WSN signifies a major facilitating environment for pervasive computing and emergencies. The growth of WSN is gained by the fusion of communication and the data sensing model. However, it is widely deployed in different appliances, like threat detection, health monitoring [1, 2], environmental tracking, and object monitoring. In general, the WSN comprises several nodes such that the nodes are placed statically with limited energy [3, 4], less communication [5], and processing amid a limited range of radio links. As, the nodes contain less storage facility, multiple function sensors, and batteries are enable for reading humidity recordings, and the temperature values. In most of the scenarios, the nodes in this network are placed in an ad hoc manner such that the nodes are allowed to communicate with the neighboring nodes for generating the network [6]. Because of less communication range, single-hop transmission is employed for sending the data packets [7]. The nodes situated in the sensor network have the capability of data processing, capturing the unit, and communication. The nodes fetch the data, record it and forward the data through wireless channels [8, 9]. The WSN comprised number of nodes such that they are powered by the batteries such that they are specifically designed to capture, track and broadcast the data to sink. Energy management is a major serious concern in the WSN environment. The two different routing protocols are flat routing and hierarchical routing in the sensor network. The flat routing mechanism allows the nodes to perform direct communication of data packets to the base station (BS), whereas hierarchical routing divides the nodes into groups called clusters. According to the unique tasks, the nodes are categorized in the cluster. The low-level nodes in the structure forward the data from clusters to CH, while high-level nodes have the responsibility to send the captured data to BS [10,11,12,13].

Each cluster group in the network contains one header called CH that effectively contacts other members of CH in the network [14]. Since the nodes required a high amount of energy for transferring the data directly to BS, it is required to used routing protocol in clustered WSN for finding the best path among CH to BS to minimize the consumption of energy [15]. The features of the routing mechanism comprise reliability, scalability, information accumulation, and fault tolerance [16, 17]. During the process of clustering number of nodes are placed in the group termed cluster concerning a set of common attributes. The nodes with high quality are chosen as CH and its responsibility is to capture the information from the group members and forward it to the higher level nodes or BS based on the category of transmission, like single or multi-hop. In the single-hop data transmission, the CH directly transmits the information to BS, whereas in the multi-hop transmission, CH sends the data to higher-level CHs that send the information to BS. The multi-hop data transmission is commonly utilized in a large-scale network. The group members are partitioned into two different groups termed as CHs and common nodes [18]. The energy of the nodes is quickly exhausted due to direct interaction among nodes and BS [19]. Hence, optimal energy is needed in the sensor network for achieving improved lifetime and performance enhancement [16, 20]. Because of considering the energy efficiency, the researchers are concentrated on developing various cluster-based routing methods [21] for increasing scalability, load balancing, and lifespan of the network [22]. The hierarchical routing models offer multi-hop routing through the generation of several clusters by considering CH in the network. The single-hop routing mechanism reduced the utilization of energy for short distances, whereas it uses more energy for the longer distance that causes degradation in performance [23]. It is a complex task in WS for creating the energy-efficient routing protocol for optimizing lifetime and resource utilization [24, 25].

In [26], Gravitational Search Algorithm (GSA) is modeled to determine the best CH among several sensor nodes. However, the chosen CH is made by observing the energy of the nodes as an important factor. Accordingly, the objective function considered for the selection of CH includes intra-cluster distance and energy efficiency. Moreover, the cost function of the multi-hop mechanism [9] considered residual energy of further hop as well as the distance from BS. In [27], the Harmony search algorithm (HSA) is designed for grouping the nodes and to generate the path in the WSN. Accordingly, fitness factors employed in the process of clustering include energy, inter-cluster, and intra-cluster distance, and the degree of the node. However, energy inspired by the node is reduced through a selection of nodes with limited distance for the data communication process [16]. In [28], evolutionary methods, such as the strength Pareto evolutionary model, genetic algorithm is developed for optimizing performance [29], temperature, and energy. However, this model developed a precise Pareto optimal front with higher precision and higher quality [24]. In [30], a balanced and energy-efficient multi-hop algorithm (BEEMH) is modeled for the multi-hop routing process in the WSN based on the Dijkstra algorithm. In [31], a balanced multipath routing mechanism is modeled by employing the energy of nodes to find the optimal route. However, the performance of the method is based on the selection of optimal routes and a limited number of hops [32].

1.1 Problem definition and motivation

Clustering is an important process to prolong the network lifetime in WSN. Cluster head is a node, which collects data from the cluster sensors and passes it to the base station. Thus, it should be energy efficient. Many methods are in practice for cluster head selection however they limits due to various reasons like.

  • Selecting an appropriate cluster head

  • Complexity of forming clusters in multiple levels

  • Reliability in routing path

  • Better residual energy and throughput

  • Link breakage

This made a motivation to design a method, which is better and efficient when compared to the existing methods.

This research is designed to develop an effective CH selection method using the proposed TSBOA. The proposed TSBOA is derived by the incorporation of Tunicate Swarm Algorithm (TSA) and Butterfly Optimization Algorithm (BOA), which effectively generates the optimal solution by incorporating the swarming behavior of the optimization algorithm. The initial energy of the node is fed as input to the Deep LSTM classifier to compute the predicted energy. Accordingly, the process of selecting CH is done using the proposed TSBOA such that it depends on the fitness constraints, such as intra and inter-cluster distance, node initial energy, predicted energy, delay, and the LLT factor. Moreover, the energy consumption of nodes is computed by considering nodes transmitting energy and receiving energy. Finally, the route maintenance procedure is done by checking the reliability of the link.

The major contribution of the research is explained as follows:

  • The proposed Tunicate Swarm Butterfly Optimization Algorithm (TSBOA) is developed by combining Tunicate Swarm Algorithm (TSA) and Butterfly Optimization Algorithm (BOA) for CH selection.

  • The CH is selected by considering the objective factors, such as inter-cluster distance, intra-cluster distance, energy consumption of nodes, predicted energy, link lifetime (LLT), and delay.

The paper is organized as follows: Sect. 2 explains the review of different methods of selecting the CH in WSN, Sect. 3 explains the system model of WSN, and Sect. 4 presents the energy computation. Section 5 presents the proposed method, Sect. 6 discussed the results of the proposed method, and Sect. 6 concludes the research.

2 Motivation

In this section, different existing CH selection-based approaches are explained along with their merits and demerits that motivate the researchers to design a TSBOA model for selecting CH in WSN.

2.1 Literature survey

Various CH selection-based methods are explained in this section. Maheshwari et al. [16] developed a butterfly optimization algorithm (BOA) for choosing CH among the count of nodes in the network. The selection of CH was done with factors, like neighboring distance, the distance between nodes and BS, node centrality, node degree, and the energy of nodes. Here, ant colony optimization (ACO) was employed for transmitting the information among CH and BS. Accordingly, the routing procedure was done with the parameters, like degree of anode, distance, and energy. Here, the performance was measured with different parameters. It increased the network lifespan but failed to increase the performance in WSN. Doostali and Babamir [33] developed a clustering model based on the topology control mechanism in the sensor network. Here, the selection of CH was made by determining the near-optimal probability such that it reached higher efficiency in the consumption of energy. Here, the probability value was specified using the count of nodes as well as the distance among the neighboring nodes in the network. The clustering mechanism adopted here was based on the sleep awake and learning automata model for increasing performance. It showed better consumption in energy but failed to solve optimization issues in a distributed manner. Baradaran and Navi [18] developed a high-quality clustering algorithm (HQCA) to compute the clusters with high quality. This method used the criteria to measure the quality of the cluster that increased the intra-cluster and inter-cluster distances and reduced the error rate while clustering. However, the optimal CH was chosen using fuzzy logic depends on the parameters, like energy and distance. It increased the lifespan of the network but failed to analyze the performance. Vinitha and Rukmini [32] developed a Taylor-based cat salp swarm algorithm (C-SSA) for offering energy-efficient routing in the sensor network. At first, the CH was chosen with the LEACH model to make efficient data communication process. The nodes transmit the data to BS through CH. This method considered different trust factors, such as direct trust, integrity factor, indirect trust, and data forwarding rate. This method increased the performance concerning throughput and delay.

Nivedhitha et al. [7] developed an energy-efficient model for balancing energy consumption and the path reliability ratio. Initially, cluster creation was performed and then the route was determined for transmitting the data between the nodes. Here, a super CH was selected that was responsible to maintain all the records of the CH as well as cluster members. The path reliability factor was used for the packet routing. It reduced the overhead and does not provide a security mechanism. Soundaram and Arumugam [10] developed an energy-efficient genetic spider monkey-based routing protocol (EGSMRP) for improving the lifespan of the network. This method considered two phases, like the setup phase and the steady-state phase. The process of selecting CH was made at the setup phase, whereas the load balancing issues were solved at the steady-state phase. Here, the control overhead was reduced by employing energy-based broadcasting. Rambabu et al. [34] developed a hybrid artificial bee colony and the monarch butterfly optimization algorithm (HABC-MBOA) for selecting the CH. The process of data transmission was done after the selection of CH. The developed method increased the performance in terms of throughput but failed to stabilize energy in selecting the CH. Mehta and Saxena [24] developed an energy-aware optimized routing algorithm for effective routing in WSN. The process of selecting CH was done by the multi-objective function. It reduced the count of dead nodes and reduced the consumption of energy. Accordingly, the selection of optimal path was done using a sailfish optimizer (SFO) that in turn made effective data communication. It increased the energy efficiency but failed to consider the mobility factor.

2.2 Challenges

Some of the issues faced by the traditional CH selection methods are discussed as follows:

  • In [18], the HQCA method is designed to generate clusters of high quality. However, this approach failed to include the factors, like optimal cluster estimation and an energy model based on the peripheral density of nodes to increase the performance.

  • A major challenge lies in the clustering method is in resolving distributed convex optimization issues in computing optimal global sub-graph [33].

  • In [7], DMEERP model is developed to improve the performance of WSN. This method failed to incorporate a secure routing process with a data availability model for balancing the security constraint with the elliptic curve cryptography model.

  • The methods named EGSMRP developed in [10] increase the lifespan of the network but failed to optimize Quality-of-Service (QoS) factors to increase the routing performance.

  • In [34], the HABC-MBOA method is modeled to choose the CH in WSN. It failed to consider the bacterial foraging algorithm (BFA) for handling energy efficiency issues while selecting CH.

In the proposed work, CH is selected based on the objective constraints, like delay, distance, LLT, and energy thus the proposed method selects CH without any issues and the path between nodes are monitored for breakage thus routing is performed efficiently.

3 System model

The system model of WSN is comprised of the energy model, mobility model, LLT model, and free space model. The energy loss while communicating the data between the nodes follows the free space model and is explained in the energy model. Let us assume the WSN with \(n\) number of nodes with a single sink node or BS as \(X\). The wireless links between the nodes specify direct communication within the transmission range. Accordingly, each node is distributed uniformly in the network environment. Each node contains its ID and the nodes are formed in a group called clusters. The sink node is placed at the near-optimal location for getting the data packets from the sensor nodes linked in the network. The data communication from a cluster member to BS is done through CH. The count of nodes availableat the network is represented as,

$$\chi = \left\{ {L_{1} ,L_{1} ,...,L_{a} ,....,L_{n} } \right\}$$
(1)

Here, \(L_{n}\) denotes a total number of sensor nodes. Figure 1 signifies the system model of WSN.

Fig. 1
figure 1

A system model of WSN

3.1 Energy model

Each node in the network has initial energy of \(G_{0}\) such that the energy of the node is not rechargeable [35]. The energy loss while traversing the packets from \(a^{th}\) normal node to \(b^{th}\) CH follows the multipath fading as well as free space mechanism based on the distance between the sender as well as the receiver. However, the transmitter contains a power amplifier as well as the radio electronics for dissipating the energy, while the receiver contains only the radio electronics for dissipation of energy. When a node sends \(q\) byte of data, the energy dissipation of the nodes is represented as,

$$G_{disi} \left( {L_{a} } \right) = G_{elec} *q + G_{amp} *q*\left\| {L_{a} - V_{b} } \right\|^{4} \,;\,if\,\left\| {L_{a} - V_{b} } \right\|^{4} \ge c_{0}$$
(2)
$$G_{disi} \left( {L_{a} } \right) = G_{elec} *q + G_{fs} *q*\left\| {L_{a} - V_{b} } \right\|^{2} \,;\,if\,\left\| {L_{a} - V_{b} } \right\|^{2} < c_{0}$$
(3)

where \(G_{elec}\) denotes electronic energy that depends on the factors, such as modulation, spreading, filtering, amplifier, and digital coding.

$$G_{elec} = G_{trans} + G_{agg}$$
(4)

where \(G_{trans}\) indicates transmitter energy, \(G_{agg}\) specifies energy of data aggregation, \(G_{amp}\) signifies the energy of power amplifier, and \(\left\| {{\text{L}}_{{\text{a}}} - V_{b} } \right\|\) indicates the distance between \(a^{th}\) node and \(b^{th}\) CH. However, the energy dissipation by the receiver while receiving \(q\) bytes of data by CH is represented as,

$$G_{disi} \left( {V_{b} } \right) = G_{elec} *q$$
(5)

After sending or receiving \(q\) bytes of data, energy value of each node gets updated.

$$G_{d + 1} \left( {L_{a} } \right) = G_{d} \left( {L_{a} } \right) - G_{disi} \left( {L_{a} } \right)$$
(6)
$$G_{d + 1} \left( {V_{b} } \right) = G_{d} \left( {V_{b} } \right) - G_{disi} \left( {V_{b} } \right)$$
(7)

The above process of data transmission is repeated until all the node becomes dead nodes. A node becomes dead only when its energy goes to less than zero.

3.2 Mobility model

The mobility model [36] is used to define the movement of sensor nodes and to specify their acceleration, location, and velocity changes concerning time. The mobility pattern is more important in determining network performance. Let us consider the initial location of the node \(a\) and \(k\) as \(\left( {u_{1} ,v_{1} } \right)\) and \(\left( {u_{2} ,v_{2} } \right)\). However, the nodes \(a\) and \(k\) moves with the varying velocity at a unique direction using the angle \(\theta_{1}\) and \(\theta_{2}\), respectively. The Euclidean distance between the node \(a\) and \(k\) is represented as,

$$D_{{\left( {ak,0} \right)}} = \sqrt {\left| {u_{1} - u_{2} } \right|^{2} + \left| {v_{1} - v_{2} } \right|^{2} }$$
(8)

Here, \(D\) denotes the Euclidean distance among the nodes.

3.3 LLT model

Due to the dynamic topology of network structure, it is needed to dynamically find route reliability [37]. Let us consider the two sensor nodes \(a\) and \(k\) are lie in the transmission range. The LLT is computed at each hop during the traversing of the route request packet. However, an individual node calculates the lifetime of the link between the current and previous hop. Let us consider the coordinate of the node \(a\) as \(\left( {{\rm M}_{a} ,{\rm N}_{a} } \right)\) and the coordinate of the node \(k\) as \(\left( {{\rm M}_{k} ,{\rm N}_{k} } \right)\), respectively. The mobility speed of node \(a\) and node \(k\) is represented as, \(S_{a}\) and \(S_{k}\). However, the distance of movement of sensor node \(a\) and node \(k\) is given as, \(\theta_{a}\) and \(\theta_{k}\). The LLT is computed as,

$$LLT = \frac{{ - \left( {\omega \lambda + \sigma \rho } \right) + \sqrt {\left( {\omega^{2} + \sigma^{2} } \right)\tau^{2} - \left( {\omega \rho - \lambda \sigma } \right)^{2} } }}{{\left( {\omega^{2} + \sigma^{2} } \right)}}$$
(9)
$${\text{where}},\,\omega = S_{a} \cos \theta_{a} - S_{k} \cos \theta_{k}$$
$$\lambda = {\rm M}_{a} - {\rm M}_{k}$$
$$\sigma = S_{a} \sin \theta_{a} - S_{k} \sin \theta_{k}$$
$$\rho = {\rm N}_{a} - {\rm N}_{k}$$

4 Energy prediction based on deep long short term memory classifier

The Deep LSTM is employed to compute the predicted energy by considering the initial energy of the node \(G_{0}\) as input. Deep LSTM [38, 39] is considered to compute the predicted energy based on the initial energy \(G_{0}\) of sensor nodes in a network environment. The Deep LSTM classifier gained the merits from both the deep network structure and LSTM model for solving the vanishing gradient issues by considering the memory cells. It comprises contextual state cells that work as long-term and short-term memory cells. Here, the predicted energy depends on the state of memory cells. The input node \(Y_{m}\) associated with the classifier receives the initial energy value from the input layer of the deep network as well as the previous hidden states \(x_{m - 1}\). The data prediction process performed is non-linear. The method used for predicting the output has a non-linear element. The function employed in the computation process includes a non-linear function as it increases the prediction accuracy. The initial energy and the result of \(x_{m - 1}\) fed to \(\tanh\) function is represented as,

$$Y_{m} = \tanh \left( {G_{0} \cdot g_{{YG_{0} }} + x_{m - 1} g_{Yx} + I_{in} } \right)$$
(10)

where \(g_{{YG_{0} }}\) denotes weight matrix among input layer and the input node of the memory cell, \(x_{m - 1}\) indicates the input of hidden state at the time \(m - 1\), specifies weight matrix among the hidden states at a diverse time interval, and \(I_{in}\) signifies bias to the input node.

The input gate \(\left( {IG_{m} } \right)\) is the same as that of the input node as it gets the same input as that of the input node. This unit considers the sigmoidal activation function and it blocks the flow of input from other nodes to current nodes and so it is specified as input gate. The operation of the input gate is represented as,

$$IG_{m} = \alpha \left( {G_{0} \cdot g_{{YG_{0} }} + x_{m - 1} g_{Yx} + I_{ig} } \right)$$
(11)

where \(IG_{m}\) specifies the input gate at a time \(m\), \(\alpha\) denotes sigmoidal activation function, and \(I_{ig}\) indicates bias to the input gate. Accordingly, the internal state \(w\) is a node with the self-loop recurrent edge of the unit weight and the linear activation function such that it is computed as,

$$w = IG_{m} \Theta Y_{m} + w_{m - 1}$$
(12)

where \(w_{m}\) specifies internal state at the time \(m\), and \(w_{m - 1}\) indicates the internal state at the time \(m - 1\). Accordingly, the forget gate \(FG\) is utilized to reinitiate the internal state of the memory cell such that it is computed as,

$$FG_{m} = \alpha \left( {G_{0} \cdot g_{{FG \cdot G_{0} }} + x_{m - 1} g_{FGx} + I_{fg} } \right)$$
(13)

where \(FG\) indicates forget state at the time \(m\), \(\Theta\) signifies the pointwise linear operator, \(g_{{FG.G_{0} }}\) signifies weight matrix of forgetting gate and input layer, \(g_{FGx}\) denotes weight matrix of forgetting gate and the hidden state, and \(I_{fg}\) signifies bias of forgetting gate.

Accordingly, the output gate \(T_{m}\) is computed as,

$$T_{m} = \alpha \left( {G_{0} \cdot g_{{xG_{0} }} + x_{m - 1} g_{Tx} + I_{og} } \right)$$
(14)

where \(g_{{xG_{0} }}\) indicates weight matrix among output gate and input layer, \(g_{Tx}\) portrays weight matrix among output gate and hidden state and \(I_{og}\) signifies bias for output gate. Moreover, the final output of the memory cell is represented as,

$$x_{m} = \tanh \left( {w_{m} } \right)\Theta T_{m}$$
(15)

where \(w_{m} = Y_{m} \Theta IG_{m} + w_{m - 1} \Theta FG_{m}\). The energy predicted by the Deep LSTM classifier is represented as \(G^{p}\), respectively. Figure 2 portrays the structure of the Deep LSTM classifier.

Fig. 2
figure 2

The architecture of the Deep LSTM classifier

5 Proposed tunicate swarm butterfly optimization algorithm for CH selection

It is more significant in the WSN environment for the selection of CH to make an efficient data transmission process. To reduce the communication delay and to enhance the routing performance, it is required to choose CH among the number of sensor nodes. By selecting CH, the cluster members cannot directly communicate with BS instead of that, the nodes can directly interact with CH, which sends the data packets to BS. The selection of CH is made by considering the objective factors, such as intra-cluster and inter-cluster distance, the energy consumption of nodes, delay, predicted energy, and LLT such that the minimum objective function is assumed as the best solution. Here, the process of selecting CH is done using the proposed TSBOA, which is derived by the integration of TSA [40], and BOA [41]. TSA is bio-inspired algorithm that mimics swarm behavior and the jet propulsion of tunicates. It considers two processes, namely foraging and the navigation process. The tunicate is the swarm that generates blue-green light. This swarm is cylindrical that is open at one end and is closed at another end. Each tunicate contains a gelatinous tunic that helps to group all the individuals. The tunicate draws water from neighboring and generates jet propulsion at the open end by the atrial siphons. The jet-like propulsion is more powerful for migrating tunicates at the vertical direction in the ocean. The swarming behavior of tunicates helps them to successfully survive in the ocean. BOA is the nature-inspired algorithm that considers the mating and the food searching behavior of butterflies. The foraging mechanism of butterflies uses the sense of smell for finding the location of mating or nectar partner. Figure 3 portrays the schematic view of the proposed TSBOA for CH selection.

Fig. 3
figure 3

Schematic view of proposed TSBOA for CH selection

Solution encoding: It is the representation of the solution vector that determines the optimal nodeas CH by considering the objective parameters. The node with maximal distance, energy and minimum delay is selected as the CH. Figure 4 portrays solution encoding.

Fig. 4
figure 4

Solution encoding

Fitness function: The fitness value is computed by employing parameters, like intra-cluster distance, inter-cluster distance, delay, energy of the nodes, LLT, and the predicted energy as the objective factors. The fitness function is computed as,

$$F = \frac{1}{6}\left[ {D^{{{\text{int}} ra}} + \left( {1 - D^{{{\text{int}} er}} } \right) + G_{cons} + B + \left( {1 - LLT} \right) + G^{p} } \right]$$
(16)

where \(D^{{{\text{int}} ra}}\) denotes intra-cluster distance, \(D^{{{\text{int}} er}}\) signifies inter-cluster distance, \(G_{cons}\) specifies energy consumption of the nodes, \(B\) denotes delay, \(LLT\) denotes link lifetime, and Gp specifies predicted energy of nodes computed by Deep LSTM classifier. The intra-cluster distance is specified as,

$$D^{{{\text{int}} ra}} = \frac{1}{{K_{1} *n*\vartheta }}\sum\limits_{a = 1}^{n} {\sum\limits_{\begin{subarray}{l} b = 1 \\ a \ne b \end{subarray} }^{\vartheta } {D_{ab} } }$$
(17)

Here, \(D_{ab}\) denotes the distance between \(a^{th}\) node and \(b^{th}\) CH, \(n\) specifies the number of nodes, \(\vartheta\) represents the number of CHs, and \(K\) indicates normalizing factor. The inter-cluster distance is computed as,

$$D^{{{\text{int}} er}} = \frac{1}{{K_{2} *\vartheta }}\sum\limits_{b = 1}^{\vartheta } {\sum\limits_{i = 1}^{\vartheta } {D_{bi} } }$$
(18)

where \(K_{2}\) denotes normalizing factor, and \(D_{bi}\) signifies distance between \(b^{th}\) CH and \(i^{th}\) CH. The energy consumption of nodes is represented as,

$$G_{cons} = \sum\limits_{a = 1}^{n} {\frac{{G_{a}^{trans} *G_{a}^{rcvr} }}{K}}$$
(19)

where \(G^{trans}\) indicates transmitter energy, \(G^{rcvr}\) indicates receiver energy and \(K\) denotes normalizing factor. The delay is represented as,

$$B = \sum\limits_{b = 1}^{\vartheta } {\frac{{L_{b} }}{n}}$$
(20)

Here, \(L_{b}\) denotes the number of nodes in \(b^{th}\) CH.

5.1 Algorithmic procedure of proposed TSBOA

The algorithmic steps involved in the proposed TSBOA are explained as follows:

  1. i)

    Initialization: Let us initialize the population of \(z\) butterflies in the solution space as, \(A_{j} \left( {j = 1,2,...,z} \right)\). The sensory is used to measure the energy form, whereas modality represents raw input utilized by sensors. The stimulus is correlated with the fitness of the solution.

  2. ii)

    Compute objective function: The fitness measure is computed to select the node as CH by considering the objective factors, such as inter-cluster distance, intra-cluster distance, delay, LLT, the energy consumption of nodes, and predicted energy. The fitness function computed for selecting CH is specified in Eq. (16).

  3. iii)

    Update solution: The sensing concept, as well as the modality processing, is based on factors, like sensory modality \(\left( M \right)\), stimulus intensity \(\left( W \right)\), and the power exponent \(\left( C \right)\), respectively.

The standard equation of BOA is expressed as,

$$A_{j}^{r + 1} = A_{j}^{r} + \left( {h^{2} *s^{*} - A_{j}^{r} } \right) \times y_{j}$$
(21)

where \(A_{j}^{r}\) denotes the solution of \(j^{th}\) butterfly at \(r^{th}\) iteration, \(s^{*}\) indicates current best solution, \(y_{j}\) indicates fragrance of \(j^{th}\) butterfly and \(h\) specifies random number with the range of \(\left[ {0,1} \right]\).The standard equation of TSA that satisfies the condition \(\left( {rand \ge 0.5} \right)\) is expressed as,

$$A_{j}^{r} = \vec{J} + \vec{R} \cdot \vec{Q}$$
(22)
$$\vec{J} = A_{j}^{r} - \vec{R} \cdot \vec{Q}$$
(23)

where \(\vec{J}\) indicates the best search agent, and \(\vec{Q}\) specifies prey density. As, \(\vec{J}\) is best search agent in TSA, \(\vec{J} = s^{*}\). By substituting the Eq. (23) in Eq. (21) is represented as,

$$A_{j}^{r + 1} = A_{j}^{r} + \left( {h^{2} *\left( {A_{j}^{r} - \vec{R} \cdot \vec{Q}} \right) - A_{j}^{r} } \right) \times y_{j}$$
(24)
$$A_{j}^{r + 1} = A_{j}^{r} + h^{2} A_{j}^{r} y_{j} - h^{2} \vec{R} \cdot \vec{Q}y_{j} - A_{j}^{r} y_{j}$$
(25)
$$A_{j}^{r + 1} = A_{j}^{r} \left( {1 + h^{2} y_{j} - y_{j} } \right) - h^{2} \vec{R} \cdot \vec{Q}y_{j}$$
(26)

where \(A_{j}^{r}\) denotes the solution of \(j^{th}\) butterfly at \(r^{th}\) iteration, \(h\) specifies random number with the range of \(\left[ {0,1} \right]\), and \(y_{j}\) indicates fragrance of \(j^{th}\) butterfly. Here,

$$\vec{R} = \frac{{\vec{H}}}{{\vec{Z}}}$$
(27)
$$\vec{H} = h_{2} h_{3} - \vec{p}$$
(28)
$$\vec{p} = 2 \cdot h_{1}$$
(29)

Here, \(\vec{R}\) specifies the vector, \(h_{1}\), \(h_{2}\), and \(h_{3}\) represents the random number with the range of \(\left[ {0,1} \right]\).

  1. iv)

    Evaluating feasibility: The optimal solution is computed by determining the best fitness value such that a node with minimum fitness value is accepted as CH.

  2. v)

    Termination: The above steps are repeated until the best solution is obtained. Algorithm 1 portrays the pseudo-code of the proposed TSBOA.

figure d

5.2 Route maintenance

The route maintenance phase examines the path and reports the failure of a link to the source node. The path between the cluster member and CH is required to be more optimal to make efficient data transmission. Moreover, it is necessary to find the link breakage in the network based on the rate of link failure [42]. Accordingly, link reliability is represented as,

$$probability\left( l \right) = e^{ - \mu \,l}$$
(30)

where \(\mu\) denotes average link failure rate that is computed as \(\left( \frac{1}{LLT} \right)\), and \(l\) indicates the time at which the link is active. If link reliability is greater than the threshold value, then the routing process can be effectively accomplished between the nodes in the network.

6 Results and discussion

This section describes the results and discussion of the proposed TSBOA for the performance measures.

6.1 Experimental setup

The implementation of the proposed method is carried out in the MATLAB tool with windows 10 OS, Intel processor, and 4 GB RAM. The simulation parameter used for the experimentation is shown in Table 1.

Table 1 Simulation parameter

6.2 Evaluation metrics

The performance of developed TSBOA is progressedby employing parameters, like alive nodes, residual energy, and throughput.

Throughput: It is termed as the measure that shows the number of packets transmitted from source to destination at unit second. It is represented as,

$$Tp = \frac{\delta }{t}$$
(31)

where, \(\delta\) specifies successfully transmitted packets, and \(t\) represents time.

Alive nodes: It defined the number of nodes that are alive in network with higher energy for routing the data packets.

Residual energy: It is the energy that remained in nodes that is the summation of the remaining energy of all the nodes.

$$G_{res} = \frac{{G_{0} - G_{cons} }}{{G_{0} }}$$
(32)

6.3 Experimental results

Figure 5 portrays the samples simulated result by considering 50 and 100 nodes. Figure 5a represents the simulated result with 50 nodes, and the simulated result with 100 nodes is shown in Fig. 5b.

Fig. 5
figure 5

Simulated result, a with 50 nodes, b with 100 nodes

6.4 Performance analysis

This section elaborates the performance analysis of the developed TSBOA model based on the number of rounds.

6.4.1 Analysis based on 50 nodes

Figure 6 shows an analysis based on 50 nodes. The analysis made to show the number of nodes alive with 50 nodes is represented in Fig. 6a. By considering 500 rounds, the nodes alive by considering the proposed TSBOA with epoch 20 is 41, epoch 40 is 40, epoch 60 is 45, epoch 80 is 47.5, and epoch 100 is 50, respectively. When rounds = 1000, the nodes that are considered as alive by the proposed TSBOA with epoch 20 is 40.59, epoch 40 is 39.6, epoch 60 is 44.55, epoch 80 is 47.025, and epoch 100 is 49.5. For 1500 rounds, nodes alive in the proposed TSBOA with epoch 20 are 26.24, epoch 40 is 25.6, epoch 60 is 28.8, epoch 80 is 30.4, and epoch 100 is 32. By increasing the rounds to 2000, the number of nodes alive by proposed TSBOA with epoch 20 is 2.05, epoch 40 is 2, epoch 60 is 2.25, epoch 80 is 2.375, and epoch 100 is 2.5.Thus, from the analysis, when number of round increases the number of alive nodes in proposed TSBOA with different epoch values is minimized.

Fig. 6
figure 6

Analysis based on 50 nodes, a alive nodes, b residual energy, c throughput

Figure 6b depicts the analysis based on the residual energy. At 500 rounds, the energy remained by the nodes using the proposed TSBOA with epoch 20 is 0.2358J, epoch 40 is 0.2489J, epoch 60 is 0.4072J, epoch 80 is 0.2620J, and epoch 100 is 0.4286J, respectively. For 1000 rounds, the residual energy obtained using proposed TSBOA with epoch 20 is 0.0742J, epoch 40 is 0.0783J, epoch 60 is 0.2062J, epoch 80 is 0.0824J, and epoch 100 is 0.2171J. When considering 1500 rounds, residual energy computed using the proposed TSBOA with epoch 20 are 0.0056J, epoch 40 is 0.0059J, epoch 60 is 0.0516J, epoch 80 is 0.0062J, and epoch 100 is 0.0543J. When the number of rounds is considered as 2000 rounds, the energy remained in the nodes using proposed TSBOA with epoch 20 is 0.0026J, epoch 40 is 0.0028J, epoch 60 is 0.0030J, epoch 80 is 0.0029J, and epoch 100 is 0.0031J.

The analysis made by the throughput measure is illustrated in Fig. 6c. When the number of rounds is considered as 500 rounds, the throughput achieved by the proposed TSBOA with epoch 20 is 68.879%, with epoch 40 is 71.310%, epoch 60 is 72.931%, epoch 80 is 76.172%, and with epoch 100 is 81.034%, respectively. By considering 1000 rounds, the throughput obtained using proposed TSBOA with epoch 20 is 66.328%, epoch 40 is 68.669%, epoch 60 is 70.230%, epoch 80 is 73.351%, and epoch 100 is 78.033%. The throughput measured by the proposed TSBOA at 1500 rounds epoch 20 is 48.139%, epoch 40 is 49.838%, epoch 60 is 50.970%, epoch 80 is 53.236%, and epoch 100 is 56.634%. For 2000 rounds, throughput of proposed TSBOA with epoch 20 is 13.331%, epoch 40 is 13.802%, epoch 60 is 14.115%, epoch 80 is 14.743%, and epoch 100 is 15.684%.

6.4.2 Analysis with 100 nodes

Figure 7 portrays the performance analysis made by considering 100 nodes. Figure 7a portrays the analysis by considering alive nodes. When it is considered 500 rounds, the nodes alive by the proposed TSBOA with epoch 20 are 82, epoch 40 is 80, epoch 60 is 90, epoch 80 is 95, and epoch 100 is 100, respectively. When increasing the rounds to 1000, the nodes that are considered as alive nodes using proposed TSBOA with epoch 20 is 81.18, epoch 40 is 79.2, epoch 60 is 89.1, epoch 80 is 94.05, and epoch 100 is 99. For 1500 rounds, nodes alive using the proposed TSBOA with epoch 20 are 52.48, epoch 40 is 51.2, epoch 60 is 57.6, epoch 80 is 60.8, and epoch 100 is 64. When increasing the rounds to 2000 rounds, the total number of nodes alive in the network using proposed TSBOA with epoch 20 is 4.1, epoch 40 is 4, epoch 60 is 4.5, epoch 80 is 4.75, and epoch 100 is 5.

Fig. 7
figure 7

Analysis based on 100 nodes, a alive nodes, b residual energy, c throughput

The analysis showed by considering the residual energy of nodes is depicted in Fig. 7b. When considering 500 rounds, the residual energy computed by the proposed TSBOA with epoch 20 is 0.2344J, epoch 40 is 0.2474J, epoch 60 is 0.4048J, epoch 80 is 0.2604J, and epoch 100 is 0.4261J, respectively. At 1000 rounds, energy remained in nodes with proposed TSBOA with epoch 20 is 0.0737J, epoch 40 is 0.0778J, epoch 60 is 0.2050J, epoch 80 is 0.0819J, and epoch 100 is 0.2158J. When increasing the rounds to 1500 rounds, residual energy computed using the proposed TSBOA with epoch 20 are 0.0055J, epoch 40 is 0.0059J, epoch 60 is 0.0512J, epoch 80 is 0.0062J, and epoch 100 is 0.0539J. By considering 1800 rounds, the remaining energy of the nodes using proposed TSBOA with epoch 20 is 0.0035J, epoch 40 is 0.0037J, epoch 60 is 0.0136J, epoch 80 is 0.0039J, and epoch 100 is 0.0144J.

Figure 7c represents the analysis made with the throughput metric by 100 nodes. For 500 rounds, the throughput computed by the proposed TSBOA with epoch 20 is 68.879%, epoch 40 is 71.310%, epoch 60 is 72.931%, epoch 80 is 76.172%, and epoch 100 is 81.034%, respectively. At rounds = 1000, throughput measured by the proposed TSBOA with epoch 20 is 66.328%, epoch 40 is 68.669%, epoch 60 is 70.230%, epoch 80 is 73.351%, and epoch 100 is 78.033%. When considering the rounds as 1500 rounds, throughput achieved using the proposed TSBOA with epoch 20 are 48.139%, epoch 40 is 49.838%, epoch 60 is 50.970%, epoch 80 is 53.236%, and epoch 100 is 56.634%. When the rounds are improved to 1800 rounds, the throughput achieved using proposed TSBOA with epoch 20 is 13.331%, epoch 40 is 13.802%, epoch 60 is 14.115%, epoch 80 is 14.743%, and epoch 100 is 15.684%.

6.5 Comparative methods

The performance of the proposed method is analyzed by considering the traditional method, such as Butterfly Optimization Algorithm (BOA) + Ant Colony Optimization (ACO) [16], Taylor based Cat Salp Swarm Algorithm (Taylor C-SSA) [32], Genetic Spider Monkey Optimization (GSMO) [10], and Hybrid Artificial Bee Colony and Monarchy Butterfly Optimization Algorithm (HABC-MBOA) [34].

6.6 Comparative analysis

This section explains the comparative analysis of the developed TSMOA based on the rounds with the variation of nodes.

6.6.1 Analysis with 50 nodes

Figure 8 illustrates the comparative analysis of the TSMOA scheme by varying the rounds with 50 nodes. Figure 8a portrays analysis based on the number of alive nodes. For 100 rounds, the number of nodes alive in the network using BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, proposed TSBOA is 39.6, 35.1,36, 41.4, and 45, respectively. Here, the performance of the proposed algorithm is 12% better than BOA + ACO, 8% better than Taylor C-SSA, 20% better than GSMO, and 22% better than HABC-MBOA. When increasing the rounds to 200, the alive nodes measured by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, proposed TSBOA is 38.808, 34.398, 35.28, 40.572, and 44.1, respectively. The number of alive nodes is more for HABC-MBOA next to the proposed method and next to HABC-MBOA, BOA + ACO has more alive nodes and BOA + ACO has minimum alive nodes.

Fig. 8
figure 8

Analysis with 50 nodes, a alive nodes, b residual energy, c throughput

The analysis made by the residual energy metric is shown in Fig. 8b. At 80 rounds, the residual energy of existing BOA + ACO, Taylor C-SSA, GSMO, and HABC-MBOA is 0.4727J, 0.4521J, 0.4778J, and 0.5035J, whereas the proposed TSBOA has acquired higher energy of 0.5293J such that the developed method shows the performance improvement of 11%, 15%, 10%, and 5% with traditional BOA + ACO, Taylor C-SSA, GSMO, and HABC-MBOA, respectively. When considering the rounds of 100, residual energy of BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 0.4569J, 0.4371J, 0.4619J, 0.4867J, and 0.5116J, respectively. The performance improvement of the proposed algorithm with BOA + ACO is 11%, Taylor C-SSA is 15%, GSMO is 10%, and HABC-MBOA is 5%. When increasing the rounds to 130, residual energy computed by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 0.4421J, 0.4229J, 0.4469J, 0.4709J, and 0.4949J, respectively. Here, the performance enhancement of the proposed algorithm while comparing with that of traditional BOA + ACO is 11%, Taylor C-SSA is 15%, GSMO is 10%, and HABC-MBOA is 5%.

Figure 8c represents the analysis of throughput using 50 nodes. At 300 rounds, throughput computed by existing methods, like BOA + ACO, Taylor C-SSA, GSMO, and HABC-MBOA is 58.593%, 71.705%, 74.266%, and 78.534%, whereas proposed TSBOA achieved higher throughput of 85.363% that results in the performance improvement while comparing with the existing BOA + ACO is 31%, Taylor C-SSA is 16%, GSMO is 13%, and HABC-MBOA is 8%. When increasing to 400 rounds, throughput measured by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 58.593%, 71.705%, 74.266%, 78.534%, and 85.363%, respectively. Here, the proposed algorithm has the performance improvement with that of BOA + ACO is 31%, Taylor C-SSA is 16%, GSMO is 13%, and HABC-MBOA is 8%. For 500 rounds, throughput obtained by the traditional techniques, like BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 58.593%, 71.705%, 74.266%, 78.534%, and 85.363%, respectively.

6.6.2 Analysis with 100 nodes

Figure 9 portrays the comparative analysis of the proposed TSMOA method with varying the rounds with 100 nodes. Figure 9a illustrates the analysis by considering the number of alive nodes. For 100 rounds, nodes alive in the network using BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, proposed TSBOA is 44, 39, 40, 46, and 50, respectively. Here, the performance of the proposed algorithm is 12% better than BOA + ACO, 22% better than Taylor C-SSA, 20% better than GSMO, 8% better than HABC-MBOA. When increasing the rounds to 200, the alive nodes measured by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, proposed TSBOA is 43.12, 38.22, 39.2, 45.08, and 49, respectively.

Fig. 9
figure 9

Analysis with 100 nodes, a alive nodes, b residual energy, c throughput

The analysis made with the residual energy measure is portrayed in Fig. 9b. When considering 800 rounds, the residual energy of existing BOA + ACO, Taylor C-SSA, GSMO, and HABC-MBOA is 0.1493J, 0.1433J, 0.1508J, and 0.1584J, while the proposed TSBOA has higher energy of 0.1659J. At 900 rounds, residual energy of BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 0.1258J, 0.1208J, 0.1271J, 0.1333J, and 0.1395J, respectively. By increasing the rounds to 1000, residual energy acquired by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 0.1016J, 0.0977J, 0.1026J, 0.1074J, and 0.1123J, respectively.

Figure 9c shows the analysis of throughput measures using 100 nodes. At 300 rounds, throughput computed by existing methods, like BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA is 68.741%, 71.167%, 72.785%, and 76.020%, while the proposed TSBOA achieved better throughput of 80.872% that reports the performance of improvement of 15%, 12%, 10%, and 5% while comparing with the existing BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, respectively. By considering the number of rounds as 400, throughput achieved by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, proposed TSBOA is 68.741%, 71.167%, 72.785%, 76.020%, and 80.872% such that the developed model shows the percentage of improvement as 15%, 12%, 10%, and 5% with BOA + ACO, Taylor C-SSA, GSMO, and HABC-MBOA. By considering 500 rounds, throughput computed by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 68.741%, 71.167%, 72.785%, 76.020%, and 80.872%, respectively.

6.7 Comparative discussion

Table 2 represents the comparative discussion of the proposed TSMOA model. The below table shows the performance measured by the proposed TSMOA model with that of the conventional methods by analyzing the values computed at rounds 1000 for each evaluation metrics. The residual energy computed by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 0.1012J, 0.0973J, 0.1022J, 0.1070J, and 0.1118 for 50 nodes. The throughput computed at 1000 rounds by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 56.354%, 68.965%, 71.428%, 75.533%, and 82.101% with 50 nodes. Similarly, for 1000 rounds, the residual energy measured by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 0.1016J, 0.0977J, 0.1026J, 0.1074J, and 0.1123J with 100 nodes. Moreover, the throughput computed by BOA + ACO, Taylor C-SSA, GSMO, HABC-MBOA, and proposed TSBOA is 66.195%, 68.532%, 70.089%, 73.204%, and 77.877% with 100 nodes, respectively.

Table 2 Comparative discussion

7 Conclusion

In this research, an effective method is developed for selecting the CH in the WSN environment using the proposed TSBOA, which is the incorporation of TSA and BOA, respectively. The proposed method selects the CH optimally by considering the fitness constraints, such as inter-cluster distance, intra-cluster distance, delay, LLT, energy consumption of the nodes, and predicted energy. The prediction energy is computed using a Deep LSTM classifier based on the initial energy value. The transmission energy, receiver energy, and the normalizing factors are used for computing the energy consumption among the nodes. The reliability of the routing path is measure using the route maintenance phase by monitoring the link breakage. The link reliability factor is compared with the threshold value to route the data packets between the source and destination. The proposed method achieved higher performance using metrics, like residual energy and throughput of 0.1118J, and 82.101%, respectively. The future dimension of research will be the consideration of the routing process using the optimization algorithm and thereby the performance can be improved.