1 Introduction

Wireless Sensor Networks (WSNs) are newly researched because it can be widely used for Internet of Things (IoT) based monitoring systems. Applications like precision agriculture, industrial-IoT, home automation, e-health care systems, etc. are been developed in recent years (Curry and Smith 2016; Stankovic 2014). WSNs are the special Ad Hoc networks and can be deployed instantly and anywhere (Khan et al. 2016). WSNs are predominantly used in monitoring applications to observe activities happening in the Field of Interest (FOI). In Kumar and Ilango (2018) and Ojha et al. (2015), experimental setups for farmland monitoring systems are discussed, which gives a glimpse of trending WSNs for precision agriculture. A pervasive framework for e-health care systems is been developed as studied by Riazul et al. (2015). Readers can refer (Liu et al. 2017; Minoli et al. 2017; Owojaiye and Sun 2013) for various monitoring applications based on WSN and IoT technology.

For environmental monitoring applications, the network structure consists of sensor nodes deployed in given FOI as shown in Fig. 1. The sensor nodes are divided into clusters. A Cluster Head (CH) node is then selected in each cluster. The sensed information is transmitted to an end destination called the sink. The one-time data transmission from sensor nodes to sink via CHs is popularly known as one round of data transmission. The sink is usually considered to be an access point or a central processing unit with high power signal processing and transmission capabilities. This sink can further be connected to the Internet to push the sensed information on web database servers for remote users.

Fig. 1
figure 1

WSN deployed for IoT based monitoring system

The sensor node is a low power processor embedded with sensors and radio circuits. These nodes can be deployed in unmanned areas, farmlands, etc. to detect environmental conditions. The sensor node is a resource-constrained device and thus has limited battery life. The battery energy of sensor nodes is majorly drained off due to wireless communications because energy is directly proportional to the distance over which transmission takes place (Tudose et al. 2013). Thus, WSN needs to be energy efficient so as to prolong the network lifetime.

From the past two decades, a lot of research has been carried out to develop energy-efficient clustering protocols to prolong the network lifetime. The WSN is stable till all the deployed sensor nodes are alive in the network because the network topology starts changing as the number of alive nodes decreases (Nayak and Devulapalli 2016; Heinzelman et al. 2000). The WSN is said to be sustainable when a maximal number of sensor nodes are alive for a long period of time. The network lifetime is measured as First Node Dead (FND), Half of the Nodes Dead (HND) and Last Node Dead (LND) (Sohal et al. 2018; Zhu and Vasilakos 2016; Amgoth and Jana 2015; Tarhani et al. 2014).

With the emerging IoT technology, WSN is envisioned to increase its scalability (Rault et al. 2014). The scalability is counted in terms of node density or coverage area. Coverage area is the FOI, where the WSN is deployed. If the FOI is increased, then the communication distances also increases. This will ultimately increase the energy consumption of the sensor nodes. Thus, a trade-off occurs between energy efficiency and the coverage area. It is a great challenge to balance this trade-off, while attaining sustainable network performance. This motivated us to research the clustering protocol to balance the above-mentioned trade-off.

Usually, node deployment is random for large scale networks. Thus, it is another challenge to cluster the nodes with minimized intra-cluster communication distances so as to minimize energy consumption. To achieve minimized energy consumption, optimization techniques like fuzzy logic, genetic algorithms, etc. are used in the literature (Valdez et al. 2014). Fuzzy-c-means (FCM) is a clustering algorithm. FCM algorithm is robust for any ambiguity. It forms clusters based on given point locations of the sensor nodes in the network (Havens et al. 2012). Hence, it can be suitably used for the clustering in WSNs (Su and Zhao 2018; Moh 2014). As the node deployment is random, the communication distances between sensor nodes are estimated based on Received Signal Strength (RSS). RSS is affected due to environmental conditions and hence the estimated communication distances may not be accurate. Due to such ambiguity in the signal parameter, WSN is uncertain. This uncertainty of the network can be tackled using Fuzzy Logic System (FLS).

The above-discussed challenges and the features of FCM and FLS, motivated us to propose a semi-distributed clustering protocol for scalable WSNs. The FCM algorithm is used to cluster the sensor nodes in order to reduce the intra-cluster communication distances. The FLS is used to model the CH selection process considering fair and worst conditions of the environment. Following are the major contributions of our work:

  1. 1.

    In literature, the cluster center point is estimated using network dimensions (Jia et al. 2016) or based on node density (Balakrishnan and Balachandran 2017; Su and Zhao 2018). This may not reduce the intra-cluster communication distances of the nodes located at cluster boundaries. The cluster center points have great impact on the cluster formation process. In this work, the FCM algorithm is used to cluster the nodes. Cluster center points are estimated based on node locations. This reduces intra-cluster communication distances of the cluster members in the network.

  2. 2.

    The wireless links are disrupted due to noisy environmental conditions, which affect the RSS. The communication distances are estimated from RSS. This uncertainty in WSN signal parameters is modeled appropriately using FLS. The fuzzy rules are properly defined considering energy level and the communication distances of the sensor nodes. Unlike the conventional protocols (Zhang et al. 2017), the parameters used are minimal to lower the complexity of the system.

  3. 3.

    Sensor nodes are homogeneous in nature. At the initial rounds, tie cases for CH selection may occur between the nodes with the same energy levels. In this work, CH is selected based on FLS. Such tie cases are resolved by properly defining membership functions for fuzzy inputs and outputs of the FLS. Due to this, network stability is improved.

  4. 4.

    The proposed FCM algorithm based clustering and FLS based CH selection protocol enhances the network lifetime compared to similar recent conventional protocols in case of an increase in node density as well as FOI. This is because the intra-cluster communication distances are reduced significantly. The proposed clustering protocol provides stable and sustainable network performance in various scenarios. The results are discussed in detail in Sect. 5. Thus, the proposed protocol can be used for a variety of IoT based environmental monitoring applications, where the sink can be located inside or outside the network.

The rest of the article is organized as follows: Sect. 2 reviews the clustering techniques. Section 3 explain the system model for energy computation. Section 4 gives a detailed illustration of the proposed protocol. Section 5 contains the simulation results with discussions. The paper is concluded in Sect. 6.

2 Clustering techniques

In this section, conventional WSN clustering protocols are discussed. LEACH protocol is the base clustering protocol of WSN (Heinzelman et al. 2000). It selects CHs based on the random probabilistic model. The protocol has two phases: setup and steady-state. In the setup phase, a node n generates a random number within the range of [0, 1]. It then compares the random number with a threshold value given by (1). If the generated number is less than the threshold \(Th\left(n\right)\), the node becomes CH for the current round. The data transmission from nodes to CH and CH to sink takes place in the steady-state phase.

$$Th\left(n\right)=\frac{p}{1-p\left(r mod\left(\frac{1}{p}\right)\right)} \quad if\,\, n \in G$$
(1)

Here, p is the desired percentage of CHs in the network. The current round is denoted by r. G is the set of nodes that have not been CHs for past 1/p rounds and n is the subset of nodes that belongs to G. The probability-based selection depends on the random number generated and the number of times a node was CH in past rounds. Due to this, nodes with low energy are also selected as CH. This caused the early death of such nodes. In proposed work, fuzzy rules are designed to select the proper energized node as CH. This improved the FND performance of the proposed protocol.

The Scalable Energy Efficient Clustering Hierarchy (SEECH) protocol is executed in three phases (Tarhani et al. 2014). These are start phase, setup phase, and steady-state phase. In the start phase, a set of possible CH nodes is selected based on residual energy and the number of neighbor nodes. From this set, few nodes are eliminated based on their probable energy ratio. Thus, nodes having more energy are selected as CHs, while others are used as intermediate nodes to forward the cluster data towards the sink. The energy consumption of the overall network in each round is more as both CHs and intermediate nodes consume energy for data forwarding. In the case of the proposed protocol, clusters are formed only once at the start of the protocol. Thus, it reduces the computations and communications required by the cluster formation phase in each round.

A distributed cluster computing energy-efficient routing scheme to balance the network load is proposed by Chang (2015). Here, a query-based cluster formation is implemented. The node declare itself as originator node if its arbitrary counter gets timed out. The originator nodes then form clusters with the neighbor nodes. These nodes also calculate the selection factor for each cluster member to select a CH. Thus, originators are overloaded and consume more energy. The protocol does not have control over a number of clusters formed per round because many nodes’ counter may get timed out simultaneously. The number of isolated nodes in the network increases as the number of dead nodes increases. Unlike Chang (2015), in the proposed work, CH computation is carried out by each node. Hence, nodes are not overloaded and energy consumption is balanced among the nodes. Also, the isolated node issue is resolved by implementing FCM based clustering at the sink. Thus, each node is considered and associated with one of the clusters formed. As clustering is carried out at the sink and only once at the start of the protocol, the cluster computation cost is reduced compared to the distributed protocols.

A topology control of tactical WSN using energy-efficient zone routing is proposed by Thulasiraman and White (2016). The FOI is partitioned into rectangular zones. The CH is selected from a set of alive nodes in the zone based on the energy level. The protocol has three phases: network setup, CH selection, and data transmission. The protocol allows nodes from a particular zone to sustain for maximal rounds, while other zones are uncovered. It is useful in tactical scenarios, where monitoring of only spot areas is required rather than a sparse region. In this work, the protocol is proposed for large scale environmental monitoring applications, which can be implemented on the IoT platform.

A Dynamic Cluster Head Selection Method (DCHSM) for static cluster formation using the Voronoi diagram is proposed by Jia et al. (2016). Sensor nodes within each Voronoi cell form a cluster. A CH is then selected for each cell in two stages. In stage one, a set of nodes based on the probabilistic model is distinguished as the first kind of CHs. After the death of the first kind of CHs, stage two is executed to find a new set of CHs based on survival time estimation. Considering each Voronoi cell as a cluster does not minimize the intra-cluster communication distances. This ultimately leads to early FND performance. Unlike DCHSM, cluster formation in the proposed protocol considers node locations in order to minimize the intra-cluster communication distances.

2.1 Fuzzy based clustering

The name fuzzy itself explains the vagueness of any quantity. Fuzzy techniques are more suitable for WSNs because the after deployment situations might not be as crisp as the ideal considerations (Valdez et al. 2014; Xu and Wunsch 2005). Thus, the uncertainty in WSNs can be handled in a better way by the logical methods of the fuzzy theory (Su and Zhao 2018; Moh 2014; Havens et al. 2012; Lee and Cheng 2012). Fuzzy clustering is further classified as FCM based clustering and FLS based clustering.

2.1.1 FCM based clustering

The FCM algorithm-based energy-efficient routing protocol for WSN is proposed by Moh (2014). The clusters are formed using the FCM algorithm. The node which is the most closely located to its cluster center is then selected as CH. The CH from the previous round will calculate the average energy level for all the cluster members. It will then run an objective function to select CH for the next round. The protocol forms clusters with reduced intra-cluster communication distnaces but overloads the current CHs with iterative computations on behalf of all cluster members to elect CHs for the next round. In case of proposed protocol, CH operates only for the current round.

An Optimal Clustering Mechanism based on FCM (OCMFCM) is proposed by Su and Zhao (2018). The initial cluster center points are chosen in the area with densely placed sensor nodes. The CH selection depends on the energy level and the node density of the network. This consideration forms inappropriate clusters in later rounds as node density is decreased. The sensor nodes, which are placed in dense region, start to die out early and soon the area remains with rare sensor nodes. Thus, the algorithm is forcibly halted on attending maximum iterations, which leads to poor clustering. In proposed work, clusters are formed once and hence do not have to forcibly halt the FCM algorithm.

A modified LEACH with expected remaining energy estimation is proposed by Lee and Cheng (2012). A pre-specified threshold and predicted expected energy is estimated to finalize CHs in the network. One of the fuzzy inputs is the subtraction of the eligible node’s remaining energy and its expected remaining energy. After a few rounds, nodes are alive with moderate energy. Such nodes do not participate in the CH selection process. This is because the CH selection criteria depend on the futuristic high value of remaining energy. In the proposed work, CHs are selected based on FLS. Due fuzzification of the CH selection parameters, most of the worst and fair situations can be considered. Thus, appropriate CHs are selected even at later rounds.

The FCM algorithm-based Clustering (FCMC) protocol is proposed to enhance the network lifetime (Rajput and Kumaravelu 2019). The CH is selected based on the perceived probability model. In initial rounds, the nodes from the moderate region are selected as CHs, while in later rounds the CHs are selected based on energy levels of the nodes located in the cluster. In initial rounds, the same nodes are consecutively selected as CHs. Due to this, the FND performance is degraded drastically. In proposed work, the fuzzy rules for CH selection are derived in such a way that a proper energetic sensor node is selected as CH for every cluster in each round.

2.1.2 FLS based clustering

A FLS-based CH selection protocol is proposed by Tamanndani et al. (2017). The CH selection procedure is followed in two steps. In step one, FLS is executed to select a set of eligible CH nodes depending on their remaining energy, node density and distance toward the sink. The final CHs are then elected, in step two, by executing another FLS that has inputs: vulnerability index, cluster centrality, and distance among the CHs. The protocol considers real-time, uncertain, and vulnerable conditions at the FOI, which makes CH selection more robust. But the dependency on the neighbor nodes ceases the performance in the large scale scenarios. The protocol has 2 FLS, each defined using 3 fuzzy inputs. Further, each fuzzy input is defined using 3 membership functions. Also, 27 fuzzy rules are derived for each FLS. Thus, a total of 18 membership functions and 54 fuzzy rules are defined for CH selection process. A similar protocol is proposed by Zhang et al. (2017), where CH selection is optimized considering too many parameters. This considerably increases the computational and time complexity of the system. In proposed work, the complexity is kept low by considering only 1 FLS system with 3 fuzzy inputs. Each input is defined using 3 membership functions and 27 fuzzy rules are derived from the given inputs.

A Fuzzy Logic-based Energy-efficient Clustering Hierarchy (FLECH) protocol is proposed by Balakrishnan and Balachandran (2017). The fuzzy inputs used are remaining energy, the centrality of node and distance to sink. The centrality of a node depends on the network dimension and number of neighbors that are at a one-hop distance. The pre-specified threshold mechanism is similar as proposed by Lee and Cheng (2012). Consideration of physical network dimensions for calculating node centrality might not be a realistic solution as nodes are randomly deployed over the FOI. In the proposed work, a semi-distributed clustering protocol is proposed, where clusters are formed by the sink. Thus, cluster computation load of the nodes is reduced.

A modified LEACH protocol based on FLS is proposed by Nayak and Devulapalli (2016). The FLS based CH selection improves network lifetime over the classical LEACH protocol. A fire detection WSN protocol is proposed by Haifeng et al. (2018). Multiple FLSs are used in the prediction of fire in FOI. The protocol has high complexity due to many input parameters of multiple FLSs. A CH selection based on fuzzy logic for improving network lifetime and reducing the number of CHs in the network is proposed by Murugaanandam and Ganapathy (2019). It uses multiple criteria decision making FLS to eliminate the CH selection. This process increases the reliability of the network.

From the literature studied, it is seen that the clustering protocols based on the FCM algorithm have reduced intra-cluster communication distances. But the major problem is faced in the CH selection phase. The CHs die out at early rounds, which affects network stability. This happens because either the CHs are overburdened (Moh 2014) or the selection criteria are not properly determined (Valdez et al. 2014). The protocols based on FLS methodology include the affiliations of different parameters to the mathematical methods. Thus, the uncertain and harsh environmental conditions that affect the RSS can be tackled effectively by using FLS. The fuzzy rules can be derived considering a wide range of ambiguity in the input parameters assumed for the CH selection criteria. But the cluster formation in such protocols does not guarantee the reduction in intra-cluster communication distances (Balakrishnan and Balachandran 2017).

This study motivated us to propose a semi-distributed clustering protocol. In this work, FCM algorithm is used to cluster the sensor nodes in the given FOI. The FCM algorithm reduces the intra-cluster communication distances significantly (Rajput and Kumaravelu 2019). The CH selection is modeled using FLS. The vagueness in communication distances occurs due to uncertain communication links, the effect of meteorological conditions, failure of a node, etc. Thus, the battery’s remaining energy and communication distances are the two important parameters that are utilized in a fuzzy manner to elect proper CHs in the network. While reducing communication costs, care is taken that computational cost is not increased reasonably.

3 System model

The radio circuit embedded on a sensor node is able to transmit over 10–100 m. The communication model suitable for this range is free space propagation or multipath fading channel models (Lanzisera and Pister 2007; Tudose et al. 2013). A typical radio communication model used in our proposed work is shown in Fig. 2. A transmitter and receiver are placed at a distance of d. Figure 2 is followed by the illustration of energy consumption calculation for one-time data transmission and reception by sensor nodes in the network.

Fig. 2
figure 2

Radio communication model

The total transmission energy required during one round consists of energy consumed for data transmission from cluster members to CH and CH to sink. The electronic circuits require a small amount of battery energy to run the device. The power amplifiers at the transmitter also consume energy for transmitting the signal. At the receiver side, energy is used for data reception and aggregation by the processing unit. The energy utilized by power amplifiers at the transmitter front end circuit is given as,

$${E}_{AMP}= \left\{\begin{array}{ll}{E}_{fs} {d}^{2} ,&\quad d \le {d}_{0}\\ {E}_{mp} {d}^{4},&\quad d > {d}_{0}\end{array}\right.$$
(2)

where \({E}_{AMP}\) is the energy utilized by power amplifiers. d is the separation distance between transmitter and receiver. \({E}_{fs}\) and \({E}_{mp}\) are the energy utilization factors of free space propagation and multipath fading channel models respectively. \({d}_{0}\) is the reference distance. The energy consumed to transmit k bits from a cluster member to CH is calculated as,

$${E}_{Tx}=k \left({E}_{ELEC}+ {E}_{AMP}\right)$$
(3)

where \({E}_{Tx}\) is the energy required by a node to transmit k bits. \({E}_{ELEC}\) is the energy required by the hardware electronic circuits. The energy utilized by a CH to receive k bits is calculated as,

$${E}_{CH\_Rx}=k {E}_{ELEC}$$
(4)

where \({E}_{CH\_Rx}\) is the energy required by CH to receive k bits. The CH further aggregates the cluster data into a single data packet and transmits it to the sink. Thus, energy utilized by a CH for transmission of data towards sink is calculated as,

$${E}_{CH\_Tx}=k \left({E}_{ELEC}+ {E}_{AMP}+ {E}_{AGGR}\right)$$
(5)

where, \({E}_{CH\_Tx}\) is the total energy spent by a CH to transmit cluster data to the sink. \({E}_{AGGR}\) is the energy used for data aggregation. The total energy utilization for one round is formulated as shown in (6). For an alive network with N nodes and M CHs, the total energy spent on data transmission from cluster members to sink via CHs is given as,

$${E}_{TOTAL}=\left(N-M\right) \left[{E}_{Tx}+ {E}_{CH\_Rx}\right]+M {E}_{CH\_Tx}$$
(6)

where \({E}_{TOTAL}\) is the total energy consumed by the WSN for one round. The first term in (6) states that there are \(\left(N-M\right)\) transmissions and receptions due to intra-cluster communications. The second term is the energy required for M transmissions toward the sink by CHs.

4 Proposed protocol

The semi-distributed WSN clustering protocol is proposed for environmental monitoring applications. For such applications, the sink may be located inside or outside the deployed WSN. The sensor nodes are assumed to follow the IEEE 802.15.4 standard and thus can be operated as Fully Functional Device (FFD) and Reduced Functional Device (RFD) to conserve its battery life (IEEE-SA 2015).

Initially, sensor nodes are randomly deployed in the given FOI. It is assumed that all the nodes are equipped with Global Positioning System (GPS) and thus are familiar with their node locations (Su and Zhao 2018). The sink is located nearby to the FOI. The proposed protocol is executed in two phases: setup phase and data transmission phase as shown in Fig. 3. In the setup phase, clusters are formed based on the FCM algorithm. It is executed once and at the start of the protocol. In this phase, the sink floods the HELLO packet in the network. The HELLO packet consists of Sink ID (SID) and the sink’s location. Each node then responds to the sink by transmitting the NODE_HELLO packet. As sensor nodes have limited transmission range, it is assumed that all the nodes relay the received NODE_HELLO packets from other nodes toward the sink (Balakrishnan and Balachandran 2017, Chang 2015). NODE_HELLO packet consists of Node ID (NID) and its location information. Thus, the sink receives location information of all the nodes in the network. The sink then executes the FCM algorithm, which is illustrated in Sect. 4.1. After determining the clusters, sink multicast the CLUSTER_INFO packets in the network. The number of CLUSTER_INFO packets generated by a sink is equal to the number of clusters formed in the network. CLUSTER_INFO packets consist of Cluster ID (CID), location of cluster center point and the NIDs of the cluster members associated with that CID. Thus, at the end of the setup phase, each node is familiar with its cluster members and the location of its cluster center point. It is affordable to run the FCM algorithm at the sink and invest a one-time communication cost in relaying the network information to form static clusters. Due to this, the computation and communication cost required in every round for cluster formation is reduced.

Fig. 3
figure 3

Phases of the proposed protocol

In the data transmission phase, CHs are selected in a distributed manner. Each node executes FLS to calculate its Selection Value (SV) to become CH, which is explained in Sect. 4.2. The FLS is defined using three inputs—energy level, distance toward the cluster center point and distance from the sink. The distances are considered as fuzzy inputs because the energy utilized is majorly affected by the distances (Tudose et al. 2013). The node multicast CH_VALUE packet to its cluster members. This packet consists of NID and the SV of the node. After exchanging the SV, all the cluster members compare their SV with the received SVs. The node with the highest SV declares itself as CH and multicast CH_DECLARATION packet including its NID and CID. The selected CH then schedules a Time Division Multiple Access (TDMA) frame for data collection from the cluster members. The signaling packets used in this protocol are listed in Table 1. As static cluster formation is carried out at the sink and CHs are selected independently in each round, the proposed protocol executes in a semi-distributed manner. The communication carried out among the cluster members for CH selection is shown in Fig. 4. The following assumptions are considered in the proposed protocol:

Fig. 4
figure 4

Communication between cluster members for CH selection

Table 1 Signaling packets used in the proposed protocol
  1. 1

    Sensor nodes are randomly deployed in the FOI.

  2. 2

    Sink and sensor nodes are static after deployment.

  3. 3

    Sensor nodes are homogeneous in nature and measure environmental parameters like temperature, humidity, air pollutants, etc.

  4. 4

    Sensor nodes follow hardware specifications as per IEEE 802.15.4 standard. The CHs operate in FFD mode while cluster members operate in RFD mode (IEEE-SA 2015).

  5. 5

    The distance between the sensor node and the sink is computed based on RSS (Xu et al. 2010).

  6. 6

    All sensor nodes have the same initial energy during deployment.

  7. 7

    All the nodes are equipped with GPS and thus are familiar with their node locations (Su and Zhao 2018; Chang 2015).

4.1 FCM algorithm based cluster formation

FCM algorithm is basically used to categorize data into a fixed number of clusters. The algorithm associates each input with one of the cluster center point. Here, FCM is used to categorize the N number of sensor nodes into M clusters. Inputs to the FCM algorithm are node locations and the number of clusters. Each node location has two-dimensional coordinates. The membership degree value is calculated based on distance toward the center point and lies within the range [0, 1]. The membership is associated in such a way that the summation of all the membership degree values of a single node with all center points equals to one. A membership degree value of 0 indicates null membership (node is away from the center point), while 1 indicates full membership (node is near to center point). In between values indicates membership degree proportional to distance. The algorithm is executed in the following manner. The first input data is node locations and is given as,

$$\varvec{A}= \left\{{a}_{1}, {a}_{2}, \dots ., {a}_{i}, \dots ., {a}_{N}\right\} , \subseteq {R}^{N\text{x}2}$$
(7)

where A is a matrix of dimension N × 2 and consists of node locations. N is the total number of sensor nodes in the network. The initial M random center points are considered to form a matrix C. It is given as,

$$\varvec{C}= \left\{{c}_{1}, {c}_{2}, \dots , {c}_{j}, \dots , {c}_{M}\right\}$$
(8)

The two matrices A and C from (7) and (8) are given as input to the objective function (OF). The OF with respect to membership degree value Wij and distance Lij is formulated as

$$OF= \sum _{i=1}^{N}\sum _{j=1}^{M}{\left({W}_{ij}\right)}^{\alpha } {{L}_{ij}}^{2} \left({a}_{i}, {c}_{j}\right)$$
(9)

where \({W}_{ij}\) is the degree of membership that the sensor node \({a}_{i}\) pertains to the cluster center point \({c}_{j}\). \(\alpha\) is the fuzzy factor having a value within the range [1, \(\infty ]\). It is the weighing exponent that affects the performance of the algorithm. For clustering without any special conditions, the \(\alpha\) is considered to be 2 (Zhou 2012). \({{L}_{ij}}^{2} \left({a}_{i}, {c}_{j}\right)\) is the mean square of the dissimilarity between node \({a}_{i}\) and center point \({c}_{j}\). The membership matrix \({W}_{ij}\) and the iterative center points are calculated as,

$${W}_{ij}= \frac{1}{\sum _{k=1}^{M}{\left(\frac{\Vert{a}_{i}- {c}_{j}\Vert}{\Vert{a}_{i}- {c}_{k}\Vert}\right)}^{\frac{2}{\alpha -1}}}$$
(10)
$${c}_{j}= \frac{\sum _{i=1}^{N}{{W}_{ij}}^{\alpha } {a}_{i}}{\sum _{i=1}^{N}{{W}_{ij}}^{\alpha }}$$
(11)

In order to minimize the objective function OF, the partial derivative of OF with respect to \({W}_{ij}\) and \({c}_{j}\) is executed iteratively using (10) and (11). The \({c}_{k}\) is a center point calculated in the past iteration for the jth cluster. The iterations are performed subject to the following conditions:

$$\sum _{j=1}^{M}{W}_{ij}=1,\quad i=1, 2,\ldots , N$$
(12)
$$0 \le {W}_{ij} \le 1, i=1, 2,\ldots , N \quad and\quad j=1, 2,\ldots , M$$
(13)

The condition in (12is used to remove node isolation issues in the network. The degree of membership of a node towards all cluster center points is distributed within a unity value. Each node has M membership degree values. The node is assigned to that corresponding cluster center point which has the highest membership degree. Therefore, all the participating sensor nodes are associated with only one of the clusters in the network. The condition in (13) limits the degree of membership within the range of 0–1. This will validate the condition in (12) and associate every node to only one cluster in the network. For every iteration, the convergence improvement of the OF is observed by subtracting past OF value from the present OF. The iterations are halted, if the subtraction obtained is less than the minimum improvement threshold. This threshold is set to 1 × e− 5. This value is selected by experimentation and it gives average iterations up to 65–80 for a network with 100 nodes deployed in 100 m × 100 m FOI. The proposed FCM algorithm for cluster formation is given in Table 2.

Table 2 FCM algorithm

4.2 FLS based CH selection

FLSs are used as decision making systems, where most of the worst and fair input conditions can be considered to get desired output. Among the two popular fuzzy models (Mamdani and Assilian 1975; Takagi and Sugeno 1985), Mamdani fuzzy model is widely used in many applications. The fuzzy logic is derived from the ‘If-Then’ statements which relate the input conditions to a favorable outcome. In our proposed protocol, for every round, each sensor node executes FLS to calculate its SV to become a CH. The crisp input parameters considered for FLS are the node’s remaining battery energy, distance from cluster center point and distance toward the sink. The crisp input parameters are normalized within the range of [0, 1]. The normalization of crisp values allows the generalization of proposed FLS for scalable networks. The first fuzzy input is the node’s Battery-energy Ratio (BR). It is normalized by taking the ratio of the node’s remaining energy to its initial energy as shown in (14). This input directly affects the network lifetime because the node is declared dead if it completely drains off its battery energy.

$$BR= \frac{{E}_{remain}}{{E}_{initial}}$$
(14)

where BR is the ratio of node’s remaining battery energy to its initial energy. \({E}_{remain}\) is the node’s remaining battery energy and \({E}_{initial}\) is the initial battery energy during deployment.

The second fuzzy input is the Cluster-distance Ratio (CR). It is the ratio of the node’s distance from its cluster center to maximum intra-cluster distance. This input decides the centrality of the node among all the cluster members. If a node is located close to the cluster center point, it means that it has more reachability towards the cluster members.

$$CR= \frac{{D}_{j}^{i}}{{d}_{max}}$$
(15)

where CR is the ratio of node’s distance from its cluster center point to maximum intra-cluster distance. is the distance between the ith node and the jth cluster center point. is the maximum distance between a cluster member and the jth cluster center point.

The third input considered is the Sink-distance Ratio (SR). The location of the sink has an impact on the energy utilization of the overall network. Thus, this parameter is significant and considered as a third fuzzy input. It is normalized as,

$$SR= \frac{{D}_{sink}}{{\delta }_{{sink}_{max}}}$$
(16)

where SR is the ratio of the node’s distance towards the sink to a maximum distance of a node from the sink in the network. \({D}_{sink}\) is the distance of the node towards the sink. \({\delta }_{{sink}_{max}}\) is the maximum distance that a node can have towards the sink from the FOI. The modeled FLS is as shown in Fig. 5.

Fig. 5
figure 5

FLS for CH selection

$${X}_{1}\left(x\right)= \left\{\begin{array}{ll}1 , &\quad x \le 0.1\\ \frac{0.4-x}{0.3} ,&\quad 0.1\le x \le 0.4\\ 0 ,&\quad x\ge 0.4\end{array}\right.$$
(17)
$${X}_{2}\left(x\right)= \left\{\begin{array}{ll}0 ,&\quad x\le 0.1\\ \frac{x-0.1}{0.3} ,&\quad 0.1\le x \le 0.4\\ 1 ,&\quad 0.4\le x \le 0.6\\ \frac{0.9-x}{0.3} ,&\quad 0.6\le x \le 0.9\\ 0 ,&\quad x\ge 0.9\end{array}\right.$$
(18)
$${X}_{3}\left(x\right)= \left\{\begin{array}{ll}0 , &\quad x \le 0.6\\ \frac{x-0.6}{0.3} ,&\quad 0.6\le x \le 0.9\\ 1 ,&\quad x\ge 0.9\end{array}\right.$$
(19)

Each of the fuzzy inputs BR, CR and SR are defined using three linguistic variables. These linguistic variables are named as Low, Moderate and High. Each of the linguistic variables is defined using a trapezoidal membership function. The distribution of membership function over the normalized input range is shown in Fig. 6. This set of membership functions is used for all the three fuzzy inputs. Hence, each of the inputs- BR, CR and SR are defined using trapezoidal membership functions as shown in Fig. 6. The crisp values of these inputs are uncertain, because energy is continuously utilized and communication distances are estimated from RSS. Thus, all these inputs are normalized within the range of 0–1 using (14), (15) and (16). As all the inputs are normalized within same range, same distribution pattern of membership functions is proposed. The membership functions for Low, Moderate and High are defined mathematically by (17), (18) and (19) respectively. The variable x takes the value within the interval [0, 1].

Fig. 6
figure 6

Membership functions for all the fuzzy inputs of the FLS

The fuzzy output of the FLS is SV and is defined using five linguistic variables. These are Very Low, Low, Medium, High and Very High. The triangular membership function is used to define these linguistic variables. The SV obtained from FLS is used to compare with the SV of other cluster members. The node within the cluster with the highest SV is declared as CH. If trapezoidal membership function is used, then the values of SV may be approximately same for all the nodes. This may lead tie cases and ambiguity in CH selection. Hence, may select improper CHs in the network. To avoid these situations, the output is defined using a triangular function rather than a trapezoidal function. The linguistic variables along the axis of the degree of membership for SV are shown in Fig. 7. The membership functions for Very Low, Low, Medium, High, and Very High are mathematically represented by (20)–(24) respectively, where y takes the value within the interval [0, 1].

$${Y}_{1}\left(y\right)= \left\{\begin{array}{ll} 1 ,&\quad y\le 0 \\ \frac{0.2-y}{0.2} ,&\quad 0\le y \le 0.2\\ 0 ,&\quad y\ge 0.2 \end{array}\right.$$
(20)
$${Y}_{2}\left(y\right)= \left\{\begin{array}{ll}0 ,&\quad y\le 0.1 \\ \frac{y-0.1}{0.15} ,&\quad 0.1\le y \le 0.25\\ 1 ,&\quad y=0.25\\ \frac{0.4-y}{0.15} ,&\quad 0.25\le y \le 0.4 \\ 0 ,&\quad y\ge 0.4 \end{array}\right.$$
(21)
$${Y}_{3}\left(y\right)= \left\{\begin{array}{ll}0 ,&\quad y \le 0.25\\ \frac{y-0.25}{0.25} ,&\quad 0.25\le y \le 0.5 \\ 1 ,&\quad y=0.5 \\ \frac{0.75-y}{0.25} ,&\quad 0.5\le y \le 0.75 \\ 0 ,&\quad y\ge 0.75\end{array}\right.$$
(22)
$${Y}_{4}\left(y\right)= \left\{\begin{array}{ll}0 ,&\quad y\le 0.6\\ \frac{y-0.6}{0.15} ,&\quad 0.6\le y \le 0.75\\ 1 ,&\quad y=0.75\\ \frac{0.9-y}{0.15} ,&\quad 0.75\le y \le 0.9 \\ 0 ,&\quad y\ge 0.9\end{array}\right.$$
(23)
$${Y}_{5}\left(y\right)= \left\{\begin{array}{ll} 0 ,&\quad y \le 0.8\\ \frac{y-0.8}{0.2} ,&\quad 0.8\le y \le 1 \\ 1 ,&\quad y\ge 1 \end{array}\right.$$
(24)
Fig. 7
figure 7

Membership functions of SV

The above-defined inputs and output are mapped to each other by deriving If-Then rules. There are three fuzzy input variables. Each input variable has three membership functions and therefore 27 rules can be derived as shown in Table 3. If the inputs are increased, the number of rules also increases. This will increase computational complexity significantly. Thus, to minimize the complexity of the system, minimum but sufficient inputs are considered compared to similar conventional protocols (Zhang et al. 2017). A rule viewer diagram displaying the mapping of fuzzy inputs and output is given in Fig. 8. The sample values considered for BR, CR, and SR are 0.6, 0.3 and 0.8 respectively. The degree of the membership function of BR for the linguistic variable set [Low, Moderate, High] is [0, 1, 0] respectively. Similarly, the degree of the membership function of CR for the linguistic variable set [Low, Moderate, High] is [0.33, 0.67, 0] respectively. Also, the degree of the membership function of SR for the linguistic variable set [Low, Moderate, High] is [0, 0.33, 0.67] respectively. According to the fuzzy rules, the sample input values map to rule numbers 11, 12, 14 and 15. The corresponding linguistic variable of the SV is Low and is determined to be 0.383.

Fig. 8
figure 8

Sample mapping of fuzzy inputs and output using rule viewer

Table 3 The ruleset for CH selection

The node with the highest SV is selected as CH in each cluster. The CH switches to FFD mode while cluster members switch to RFD mode. The CH node then schedules TDMA slots for cluster members for data collection. After transmitting the data to CH as per the given time slot, the cluster members go into sleep mode. The CH aggregates and forward cluster data to sink. The summary of the proposed protocol is given in Table 4.

Table 4 Summary of the proposed protocol

4.3 Computational complexity analysis

The proposed protocol is based on the FCM algorithm and FLS. Both are iterative algorithms and thus, the computational complexity of the proposed protocol is discussed in this sub-section. The proposed protocol is semi-distributed, in which the FCM algorithm is executed by the sink and the FLS for CH selection is executed by the node. Thus, the complexity of both techniques is tabulated separately. The FCM algorithm has maximum iterations denoted by I. It is executed only once at the start of the protocol. The complexity is computed for maximum iterations, though the algorithm may not utilize all I iterations to converge at optimum output. Step 4 of Table 2 calculates objective function OF, where a total number of sensor nodes are N and the total number of clusters formed are M. It uses N(N − 1)(M − 1) additions and NM (\(\alpha\)+1) multiplications. \(\alpha\) is the order of the degree of membership function denoted in (9). The objective function also computes membership matrix \({W}_{ij}\) and iterative cluster center point \({c}_{j}\). The matrix \({W}_{ij}\) requires 4M − 1 additions and 2M + 1 multiplications. The center point \({c}_{j}\) computation requires 2(N − 1) additions and M(2\(\alpha\) − 1) + 1 multiplications. For I iterations, (I − 1) additions and comparisons are executed according to Steps 5 and 6 of Table 2 respectively. This algorithm is executed by the sink and thus nodes are not overburdened with cluster computations.

The FLS has three main functional blocks: fuzzification of inputs, rule inference logic and defuzzification of output. To analyze the complexity, \({I}_{in}\) and \({I}_{o}\) variables are used to indicate the number of fuzzification and defuzzification iterations respectively (Arthi and Arulmozhivarman 2016; Kim, Ahn, and Kwon 2000). The FLS has three inputs and each input is defined by three membership functions. Thus, 27 rules are derived to infer one output. If the fuzzy inputs or the linguistic variables are increased then the number of rules also increases. This significantly increases the complexity of the system. The output is defined by five membership functions. The number of additions required for fuzzification is (\(4X{I}_{in}+3X\)) and the number of multiplications is (\(4X{I}_{in}+3X+3\)), where X is the number of membership functions used to define inputs. The number of comparisons required for each fuzzy input is \(\left(X+3X{I}_{in}\right)+4X\). The comparison with zero value is done 4X times, which can be neglected. Thus, the total number of comparisons for all three fuzzy inputs is \(3\left(X+3X{I}_{in}\right)+4X\). The rule inference logic compares incoming values of fuzzy inputs with the set of rules and maps the given input combination to an appropriate fuzzy output value. Thus, comparisons required are \({FL}_{rules}({I}_{o}-1)\). The corresponding iterative SV values obtained from ruleset are summed to get final SV for the given set of inputs. As there are three inputs, the number of additions required by rule inference logic is \({3FL}_{rules}{I}_{o}\). In the defuzzification process, the corresponding membership function of Y is checked according to mapped rules and Y is calculated from respective membership Eqs. (20)–(24). The numbers of additions and multiplications required for defuzzification of output are \({FL}_{rules}\left(8Y{I}_{o}+Y\right)\) and \({FL}_{rules}(8Y{I}_{o}+Y+1)\) respectively. The Y is defined by five linguistic variables and thus, the fuzzy output is converted into crisp output using \({FL}_{rules}(Y+5Y{I}_{o}-1)\) comparisons. The computational complexity of the proposed protocol is summarized in Table 5.

Table 5 Computational complexity analysis of proposed protocol

4.4 Time complexity analysis

The FCM and FLS are the iterative algorithms that take time to derive a final output. Due to this reason, an analysis of time complexity is carried out in this sub-section. The time complexity is denoted by a function O(). It is a function of system inputs, outputs and the time-dependent processing variables that are used to execute the proposed protocol. At the start of the protocol, clusters are formed based on the FCM algorithm. The two inputs of the FCM algorithm are node locations and the number of clusters to be formed in the network. The time complexity of this cluster formation process is calculated as \(O\left(N \text{x} {M}^{2} \text{x} I\right)\). N is the total number of sensor nodes in the network. M is the number of clusters formed. For every iteration, the center point is replaced so as to converge the algorithm at a minimum value of the objective function. The number of maximum iterations is denoted by I. All the nodes execute FLS for calculating SV as CH selection criteria. The FLS has three inputs, each defined by three membership functions. Thus, 27 rules are derived that are used to calculate SV. The time complexity involved in this process is calculated as \(O\left(N \text{x} {FL}_{rules}\right)\). The time complexity analysis is summarized in Table 6.

Table 6 Time complexity analysis of the proposed protocol

5 Simulation results and performance analysis

The proposed protocol is to be used for environmental monitoring applications. It is based on FCM and FLS, which can be easily implemented in MATLAB. Thus, simulations are carried out using MATLAB R2017b tool. The results are compared with recent clustering protocols–OCMFCM (Su and Zhao 2018), FLECH (Balakrishnan and Balachandran 2017), DCHSM (Jia et al. 2016) and LEACH (Heinzelman 2000). LEACH is the base conventional clustering for WSNs. Hence, it is considered for comparison. OCMFCM, FLECH and DCHSM are used in comparison since these protocols are developed for scalable WSN.

The network starts collapsing after the death of the first node. Thus, the stability of WSN is measured in terms of FND performance. WSN is said to be sustainable if more nodes are alive for the maximum number of rounds. Thus, the HND performance is considered as a measurement of the sustainability of WSN.

In our work, the maximum iterations considered for FCM is 100 and the fuzzy factor \(\alpha\) considered is a constant value of 2 (Rajput and Kumaravelu 2019, Su and Zhao 2018). The energy parameters considered in simulations are similar as in (Su and Zhao 2018). The data packet size considered is 2400 bits (Rajput and Kumaravelu 2019, Melike et al. 2018). The simulation parameters used are listed in Table 7.

Table 7 Parameters used for simulation

5.1 Performance of proposed protocol for different sink locations

Depending on the application requirement, the sink may be placed inside or outside of the FOI. For instant, if WSN is deployed at a production factory for monitoring of machinery then sink might be placed at the control office situated inside the factory. Another example is WSN deployed at farmland for precision agriculture. The sink can be placed at the farmhouse which is usually situated at the corner of the farmland. The location of the sink affects the communication distances and ultimately the energy consumption of the sensor nodes. Thus, the performances of the proposed and conventional protocols are simulated for various sink locations. For the following four scenarios, the FOI considered is 100 m x 100 m and the number of nodes deployed is 100. Thus, coverage ratio is given as \(\frac{100 \times 100}{100}=100\).

5.1.1 Scenario 1

The sink is placed outside the FOI at (50, 150). The graph of total remaining network energy and the number of alive nodes with respect to rounds are shown in Figs. 9 and 10 respectively. The simulations are carried out for five different WSN deployments. The results are observed in terms of FND, HND and LND as listed in Table 8.

Fig. 9
figure 9

Total remaining network energy Vs Rounds for scenario 1 with 100 nodes

Fig. 10
figure 10

Number of alive nodes in network Vs Rounds for scenario 1 with 100 nodes

Table 8 Network lifetime in terms of FND, HND, and LND for scenario 1 (100 nodes)

It is observed that the proposed protocol enhances the network lifetime compared to the recent conventional protocols. In OCMFCM (Su and Zhao 2018), clusters are formed based on the FCM algorithm, but the CHs are selected from high node density regions. Thus, nodes located away from dense region utilizes more energy for communications. Due to this, the FND performance of OCMFCM (Su and Zhao 2018) is poor. In the case of FLECH (Balakrishnan and Balachandran 2017), node centrality is based on one-hop adjacent neighbor nodes. Hence, nodes placed close to each other often have a high priority to become CH. This resulted in an early occurrence of FND. In DCHSM (Jia et al. 2016), the second set of CHs is selected only after the death of the first kind of CHs. This resulted in poor FND performance. LEACH protocol (Heinzelman et al. 2000) randomly selects CHs without considering the energy level of the nodes. This random selection allows low energy node to become CH, which degrades the FND performance.

In our work, the communication distances among the cluster members are minimized because of the FCM algorithm-based clustering. This minimizes the energy consumption due to intra-cluster communications. Also, in each cluster, CH is selected based on selection criteria calculated by FLS. The fuzzy rules are defined in such a way that low energized nodes are not prioritized for CH selection. This improved the stability of the network. The stability is enhanced by 90.5%, 92.8%, 93.6% and 20.4% compared to LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016), FLECH (Balakrishnan and Balachandran 2017) and OCMFCM (Su and Zhao 2018) respectively.

5.1.2 Scenario 2

The sink is placed at the corner of the FOI. The location of the sink is (100, 100). The graph of total remaining network energy and the number of alive nodes with respect to rounds are shown in Figs. 11 and 12 respectively. The simulations are carried out for five different WSN deployments. The results observed are listed in Table 9. As the distance from the sink to FOI is decreased, CHs require less energy for their data transmissions. Hence, from Tables 8 and 9, it is seen that the WSN lifetime in scenario 2 is enhanced compared to scenario 1. The FND performance is enhanced by 88.9%, 91.8%, 93.5% and 19.1% compared to LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016), FLECH (Balakrishnan and Balachandran 2017) and OCMFCM (Su and Zhao 2018) respectively.

Fig. 11
figure 11

Total remaining network energy vs rounds for scenario 2 with 100 nodes

Fig. 12
figure 12

Number of alive nodes in network vs rounds for scenario 2 with 100 nodes

Table 9 Network lifetime in terms of FND, HND, and LND for scenario 2 (100 nodes)

5.1.3 Scenario 3

The sink is placed at the border of the FOI. The location of the sink is (50, 100). The plot of total remaining network energy and the number of alive nodes with respect to rounds are shown in Figs. 13 and 14 respectively. The simulation results for five different WSN deployments are listed in Table 10. The FND performance is enhanced by 87.6%, 92.3%, 93.5% and 18.4% compared to LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016), FLECH (Balakrishnan and Balachandran 2017) and OCMFCM (Su and Zhao 2018) respectively.

Fig. 13
figure 13

Total remaining network energy vs rounds for scenario 3 with 100 nodes

Fig. 14
figure 14

Number of alive nodes in network vs rounds for scenario 3 with 100 nodes

Table 10 Network lifetime in terms of FND, HND, and LND for scenario 3 (100 nodes)

5.1.4 Scenario 4

The sink is placed at the center of FOI at (50, 50). The plot of total remaining network energy and the number of alive nodes with respect to rounds are shown in Figs. 15 and 16 respectively. The simulation results for five different WSN deployments are listed in Table 11. The FND performance is enhanced by 88.5%, 92.8%, 92.9% and 24.3% compared to LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016), FLECH (Balakrishnan and Balachandran 2017) and OCMFCM (Su and Zhao 2018) respectively.

Fig. 15
figure 15

Total remaining network energy Vs Rounds for scenario 4 with 100 nodes

Fig. 16
figure 16

Number of alive nodes in network vs rounds for scenario 4 with 100 nodes

Table 11 Network lifetime in terms of FND, HND, and LND for scenario 4 (100 nodes)

The effects of sink locations on the FND, HND, and LND of the simulated protocols are compared in Figs. 17, 18 and 19 respectively. It is observed that as the sink is placed nearer to WSN, the network lifetime of the protocols increases. The proposed protocol outperforms the other compared conventional in all scenarios. In the proposed work, the intra-cluster communication distance is minimized by using the FCM algorithm. This allowed nodes in a cluster to utilize minimum energy for data transmission toward the respective CH. Also, a node that has more energy and moderately placed at the central region of the cluster is selected as CH. This selection is determined using fuzzy rules. Due to proper selection factors, the network load distribution is balanced among the nodes. This significantly enhanced the network lifetime of the proposed protocol.

Fig. 17
figure 17

Effect of sink location on FND performance of the network

Fig. 18
figure 18

Effect of sink location on HND performance of the network

Fig. 19
figure 19

Effect of sink location on LND performance of the network

5.2 Performance of proposed protocol for an increase in node density

In this sub-section, the FOI considered is 100 m × 100 m and the nodes are 200. The graph of the total remaining network energy and a number of alive nodes are given in Figs. 20 and 21 respectively. The simulated results for five different WSN deployments are observed and listed in Table 12.

Fig. 20
figure 20

Total remaining network energy vs rounds for a network with 200 nodes

Fig. 21
figure 21

Number of alive nodes for a network with 200 nodes

Table 12 Network lifetime in terms of FND, HND, and LND for 200 nodes

From Tables 8 and 12, it is observed that the increase in node density definitely increases the network lifetime. In the case of (Heinzelman et al. 2000), a random selection of CH leads to poor FND performance. In the case of Jia et al. (2016), the probability criteria depend on transmission distance alone. The energy consideration is done after the death of the first kind of CHs. This results in poor FND performance. In OCMFCM (Su and Zhao 2018), balanced clusters are formed using the FCM algorithm. The CHs are selected based on the node density. Hence the performance degrades as the number of dead nodes increases in the network.

The average values of the statistical data calculated in Table 12 are also graphically represented in Fig. 22. The proposed protocol forms balanced clusters based on the FCM algorithm that reduces intra-cluster distances significantly. The network load among the nodes is balanced using FLS. The fuzzy rules are set in such a way that node with adequate energy level and lesser transmission distance is elected as CH. Hence, the FND occurs at higher rounds. The stability is enhanced by 83.7%, 92.7%, 92.9% and 19.5% compared to LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016), FLECH (Balakrishnan and Balachandran 2017) and OCMFCM (Su and Zhao 2018) respectively. Also, the network is sustainable by 49.7%, 34.2%, 22.4% and 17.8% compared to LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016), FLECH (Balakrishnan and Balachandran 2017) and OCMFCM (Su and Zhao 2018) respectively.

Fig. 22
figure 22

Average FND, HND, and LND performance of the simulated protocols for a network with 200 nodes

5.3 Performance of proposed protocol for increased coverage area

In this sub-section, the FOI considered is 200 m × 200 m and the number of nodes deployed is 100. The sink is placed at (200, 200). The coverage ratio is given as \(\frac{200 \times 200}{100}=400\). Compared to coverage ratio considered in Sect. 5.1, here it is increased from 100 to 400. This is because the number of nodes deployed are same, but the FOI is increased from 100 to 100 m2 to 200 × 200 m2. For increased FOI, the communication distances may be at times greater than 100 m. As mentioned in Sect. 3, a sensor node is able to transmit over 10–100 m. In this case, if a node needs to transmit beyond 100 m, the propagation model is unknown. As per the studied literature (Balakrishnan and Balachandran 2017; Chang 2015), for simulations, multi-path fading channel model is used in such case. But in real time scenario the proper propagation model is unknown. The FND, HND, and LND performance for five different WSN deployments are observed and listed in Table 13. The graph of the total remaining network energy and a number of alive nodes are given in Figs. 23 and 24 respectively.

Fig. 23
figure 23

Total remaining network energy for FOI of 200 m × 200 m

Fig. 24
figure 24

Number of alive nodes for FOI of 200 m × 200 m

Table 13 Network lifetime in terms of FND, HND, and LND for FOI of 200 m × 200 m

As the coverage area increases, communication distance between the sensor nodes also increases. This ultimately increases the energy consumption by the nodes. Under this sparse condition also the network lifetime of the proposed protocol better than the related conventional protocols. The FCM algorithm is executed at the sink, in which all the nodes are considered by enforcing a degree of membership with conditional limits given in ****(12) and (13). The unity value obtained by the summation of a node’s membership towards all cluster center points states that every node is associated with a cluster in the network. Thus, in the proposed protocol none of the nodes is isolated in the network. Each node participates in the CH selection process. This decreases the communication distances of nodes and ultimately results in less energy consumption even for increased coverage area.

The comparison between FND, HND, and LND performance of simulated protocols are represented in Fig. 25. For the increased FOI, the transmission distance possibly utilizes more energy. Though the network ends off with less number of rounds compared to early analyses discussed, the proposed protocol has improved FND over LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016), FLECH (Balakrishnan and Balachandran 2017) and OCMFCM (Su and Zhao 2018) by 61.3%, 77.7%, 65.2% and 8.34 respectively. The network is also sustainable by 42.9%, 20.4%, and 30.4% compared to LEACH (Heinzelman et al. 2000), DCHSM (Jia et al. 2016) and OCMFCM (Su and Zhao 2018) respectively. For large coverage areas, FLECH (Balakrishnan and Balachandran 2017) have better HND performance because its cluster center point is estimated at the dense region. But this will allow nodes at the dense region to become CHs consecutively. Due to such selection criteria, the network lifetime is reduced. Hence, the protocol resulted in poor FND performance. Though the proposed protocol has low HND performance compared to FLECH (Balakrishnan and Balachandran 2017), the FND and network lifetime is significantly better.

Fig. 25
figure 25

Average FND, HND, and LND of the simulated protocols for FOI of 200 m × 200 m

6 Conclusions

In this research work, a semi-distributed clustering protocol for WSNs is proposed. The clustering is based on FCM algorithm, while CH selection is based on FLS. Due to FCM algorithm, the intra-cluster communication distances are reduced significantly. The uncertainty in the environmental conditions affect the signal strength, which ultimately affects the communication distances. Such uncertainty in network parameters are tackled by deriving proper fuzzy rules for given FLS. Due to such selection criteria, appropriate and energetic nodes are selected as CHs in the network. This ultimately improved the FND performance of the proposed protocol. The simulations are carried out for different sink locations, scalable node density and coverage area. The simulation results indicate that the proposed protocol outperforms the recent conventional protocols in terms of network lifetime and energy conservation. The results also indicate that the proposed protocol can be suitably used for variety of IoT based monitoring applications irrespective of the sink positions.

As a future work, the energy efficiency of the proposed protocol can be further improved for large coverage area inculcating other methods like multi-hop routing or multi-sink deployment.