1 Introduction

WSNs are useful to monitor physical conditions by sensing events happening within specific area and then passing the data gathered to the sink. It has unlimited potential applications such as military, environmental control, habitat monitoring and acoustic data gathering, etc. Compared with the traditional computer networks, WSNs consist of many resource-constrained small smart nodes with limited processing power, memory space, data processing ability, and communication distance [1]. The traditional approach of relaying data in WSN diminishes the network’s lifetime as sensor nodes are used as a relay and most of their power is consumed by radio transmission in the multi-hop routing [2]. As a solution of minimizing wastage of power, UAVs have been used for traversing the network and collecting the sensing data directly from the sensor nodes [3,4,5,6] to prolong network lifetime by reducing sensors power that might be wasted due to data transmission through multi-hop routing.

UAVs help to maximize the data communication capability of the network as relay of nodes [7]. They support data acquisition from a wide geographical area of interest that are harsh, remote and/or isolated due to lack of communication infrastructure [8]. For instance, they can be used in various military and civilian fields that include search and rescue missions [9], wildfire management, observation support [10], agricultural monitoring, border surveillance and monitoring, environmental and meteorological monitoring [11], etc.

While UAVs provide enormous benefits as a relay for WSNs, it also introduces some challenges that need to be considered. It spends more time to collect data, which in turn incurs a higher data delivery latency. In our previous research work we have published the directional NN algorithm directed to the next nearest node (DDNN) [12]. The DDNN algorithm modifies the existing NN algorithm to gain a reduction in the data delivery latency. Although node and/or link failures occur frequently in WSNs, fault tolerant aspects related to this algorithm were left open and are the focus of this study. As sensor nodes are prone to failure due to energy depletion, hardware failure, communication link errors, and so on, fault tolerance should be seriously considered in UAV relay WSNs. In this paper, we propose a fault tolerable DDNN scheme (FT-DDNN) as an extension of the DDNN algorithm to enable the network to persist in spite of the failed nodes and/or links as well as to reduce the data acquisition latency of UAV relay WSN. The objective of this study is to contribute an efficient method that produce shorter path that enable faster data delivery with fault tolerable capability.

The rest of the paper is organized as follows. Section 2 reviews related works and presents problem description. Section 3 addresses our proposed algorithms. Section 4 describes the Performance analysis of the FT-DDNN algorithm. Section 5 presents performance evaluation and results, and finally Sect. 6 concludes the paper.

2 Related Works and Problem Description

UAVs play a significant role in achieving the effectiveness of data gathering in WSNs. The use of UAV as a relay in wireless sensor networks (WSNs) significantly reduces sensor nodes’ energy consumption as the UAV replaces the multi-hop communication among nodes. This is due to the large radio coverage and conveniently rechargeable batteries of UAVs [4]. The authors in [5] proposes the usage of UAVs as a relay to guarantee the delivery of sensed data from WSN nodes to the base station. They pointed out how UAV resolves connectivity difficulties in WSNs due to failures that may lead to isolation of nodes. Due to the relatively lower speed of UAVs, when compared with electromagnetic or acoustic waves, collecting data through UAVs causes significant delay compared with multi-hop forwarding [6]. This delay of routing data through UAVs in turn suffers higher data delivery latency. Previous research works in the literature focused on TSP based solutions in which each node is visited exactly once in the tour [13, 14].

Since TSP belongs to a family of NP complete problems [15] whose computational complexity increases exponentially with the increment of number of nodes, heuristic algorithms, such as nearest neighbor heuristic TSP algorithm (NN), have been developed for reducing this data delivery latency in shortest possible time. NN is the most straightforward TSP heuristic algorithm which is nearly close to the optimal route with relatively short running time. The key concept of this algorithm is to start with a randomly chosen node and to add the nearest but not yet visited node to the last node in the tour until all the nodes are visited, and then return to the initial node [16]. The NN algorithm can be employed for reducing the length of the tour path of UAV to gather and deliver sensor data in WSN. Although NN runs fast, producing the optimal tour highly depends on the layout of the nodes within the problem.

In our previous research work, we have proposed and published the directional NN algorithm directed to the next nearest node (DDNN) that modify the existing NN algorithm to gain a reduction in the data delivery latency [12]. The basic concept of the DDNN algorithm is to amend the previous NN algorithm to let the UAV initially move directly to the first nearest node of the base station until it reaches to the boundary of the transmission range of the node. Then, it directs to the nearest node of the current node until it arrives to the boundary of the transmission range of the node. The algorithm follows the same procedure until all nodes are visited as shown in Fig. 1. In the figure the dotted line represents the path in the existing NN-TSP algorithm, whereas the solid line indicates the path based on the modified algorithm (DDNN).

Fig. 1
figure 1

UAV path based on the DDNN approach

The DDNN algorithm requires location of the point where the drone turns its direction to the next move. Suppose that every node corresponds to a point in Euclidean space and the points \(P\left( {P_{x} ,P_{y} } \right)\), \(S\left( {S_{x} ,S_{y} } \right)\) and \(Q\left( {Q_{x} ,Q_{y} } \right)\) are locations of the first node, the drone initial point and the point where the normal path of the previous NN-TSP algorithm meets the transmission range boundary of the first node respectively as shown in Fig. 1.

Consider \(\left( {\vec{S} - \vec{P}} \right)\) to be a vector in the direction of \(\vec{S}\). We know that the unit vector \(\frac{{\left( {\vec{S} - P} \right) }}{{\left\| {\vec{S} - \vec{P}} \right\|}}\) is in the same direction with \(\left( {\vec{S} - \vec{P}} \right)\) but with a length of one unit. If we multiply the transmission range r of the node with the unit vector, we reach the circle (the transmission boundary of the node) at point Q in the direction of the unit vector a distance of r units from node P.

$${\text{Thus}},\quad \vec{Q} = \overrightarrow {AP} + r \cdot \frac{{\left( {\vec{S} - \vec{P}} \right) }}{{\left\| {\vec{S} - \vec{P}} \right\|}}$$

Therefore, the point Q with its X and Y coordinates (the point where the normal path of the previous NN-TSP algorithm meets the transmission range boundary of the node) is thus:

$$Q\left( {Q_{x} ,Q_{y} } \right) = \left( {P_{x} + r \cdot \frac{{S_{x} - P_{x} }}{{\sqrt {\left( {S_{x} - P_{x} } \right)^{2} + \left( {S_{y} - P_{y} } \right)^{2} } }} ,\quad P_{y} + r \cdot \frac{{S_{y} - P_{y} }}{{\sqrt {\left( {S_{x} - P_{x} } \right)^{2} + \left( {S_{y} - P_{y} } \right)^{2} } }}} \right).$$

The UAV uses the point \(Q\left( {Q_{x} ,Q_{y} } \right)\) as turning point to the next move in order to shorten the tour length for reducing the data delivery time. The detail of the algorithm is provided in our published paper [12]. Although the DDNN algorithm reduces the data delivery latency by modifying the existing nearest neighbor heuristic TSP algorithm (NN), it does not consider the reliability of the system in case of node or link failures. To collect the sensing data rapidly and reliably, the DDNN algorithm should be able to react to node or link failures effectively and manage the data transmissions in the network.

The situation of WSNs is unpredictable due to various reasons including electromagnetic interferences, system noise, the running out of battery, loss of transmission signal and other faults in the networks [17, 18]. Such kind of faults degrade the data transmission quality and reliability of the WSNs. Specifically, WSNs are prone to failures when nodes are deployed in harsh environments. Atmospheric phenomena, such as changes in weather modify signal propagation, which may in turn cause communication errors as signal strength decreases [19]. Several environmental conditions such as humidity and temperature can modify signal propagation. In WSN, even if the hardware condition of nodes is good, communication between sensor nodes may get affected by signal strength, interferences, and obstacles [20]. Radio interference can cause link failure among nodes. Movement of objects near the nodes, may block the signal. For instance, in agricultural fields [21], pointed out that link range is considerably reduced when plants start growing. As link failure degrades quality in WSNs, a fault tolerance mechanism that tolerates this kind of failure need to be developed.

Fault tolerance is the ability of a system to immediately detect and recover from faults and deliver the desired level of functionality in the presence of faults [22, 23]. It is one of the requirements of a dependable system [24]. The authors [25, 26] designed protocols that are to be resilient against communication faults caused by failed nodes. Those protocols mitigate failures by sending multiple copies of data among different routes, thus increasing the probability of correct reception at the access point. However, the cost of sending multiple copies of data among different routes is significant as communication among nodes consume much energy.

The authors in [27] suggested an algorithm to automatically select redundant nodes in WSNs deployed on areas with obstruction, based on availability requirements of an application and network configurations. The redundancy enhances the availability as redundant nodes replace faulty nodes. Although the redundant sensors form a fault tolerant network, it may incur cost of deployment of redundant nodes.

Suggested work in [28] presented a fault tolerance and energy-efficient clustering (FTEC) algorithm to detect and recover the faults of cluster heads and cluster member nodes. In FTEC, a node as a backup cluster head is selected to detect and recover the faults of cluster heads and cluster member nodes. A weighted median is also utilized to detect and recover the faults of cluster nodes. Faulty node is isolated and replaced by a nearby node switched from sleep mode to wake up mode to recover a fault. The existing fault tolerant mechanisms need to use extra hardware and software resources due to faulty situations.

The authors in [29] have proposed dual separate path (DSP) algorithm which is used to determine redundant paths to provide fault-tolerant communication for WSN applications, such as fault-tolerant cyber physical systems. Although increasing the number of disjoint paths or redundant routes in the WSN can significantly enhance fault-tolerant against link failures, finding path weights and sorting paths incur higher computational overhead.

The authors in [30] have introduced a management framework that can provide fault tolerance capability in network nodes and in the communications between them. In the proposed management framework ECRAFT (Energy-Check Point/Recovery/Based Scheme for Fault), each cluster head node should select a node as its check point node. In ECRATF, cluster heads send aggregated information towards the sink and backup copy of aggregated information to the Check Point node. However, due to the frequent exchange of data between the cluster head and non-cluster head members, energy consumption would increase.

A fault-tolerant cluster based multipath algorithm (FTCM) that utilizes a hybrid energy-efficient distributed clustering approach to cluster nodes is proposed for WSNs in [31]. In FTCM, backup node that stores a copy of the data of cluster head is selected to increase the fault tolerance of cluster head. FTCM also adopts three routing paths between cluster head and base station to increase fault-tolerant routing and improve path reliability. The different routing paths are implemented based on effective routing parameters like residual energy of nodes, number of hops, reliability and propagation speed. However, selecting the backup node for cluster head and discovering an alternative path for reliable data transmission may cause extra computational complexity and resource utilization.

Various mechanisms have been discussed in the literature to tolerate faults in WSNs. However, those mechanisms consume extra costs such as cost of deployment of redundant nodes, use of extra resource utilization, higher computational overhead, high node power consumption, etc. to tolerate faults. The traditional approach of relaying data in WSNs diminishes the network’s lifetime as sensor nodes are involved as a relay and most of their power is consumed by radio transmission. As a resolution of minimizing wastage of power, UAVs have been used for traversing and collecting data directly from the sensor nodes to the sink to prolong network lifetime by reducing sensors power that might be wasted due to data transmission through multi-hop routing. However, during data collection the UAV may lose communications with the faulty or blocked sensors due to link failures related to surrounding environments and radio interference of sensor nodes. To mitigate the problem, we propose a fault tolerable DDNN scheme (FT-DDNN) as an extension of the DDNN algorithm to enable the network to persist in spite of the failed nodes and/or link failures while reducing the data acquisition latency of UAV relay WSNs. The main motivation of our algorithm is driven by the need to provide support for reliability and low latency in the presence of errors. In other words, the algorithm should allow the UAV to visit each node with the objective of increasing the data detection rate and reducing the data delivery latency.

3 The Proposed Algorithm: The Fault Tolerable DDNN Algorithm (FT-DDNN)

A fault in the communication link can cause the UAV not to receive the required data from the sensors as well as relaying the data to the sink. In the DDNN scheme, the UAV may lose communications with the faulty or blocked sensors at a distance from its turning point. As a solution, the UAV must have an effective route even in the presence of failures. To make our proposed algorithm more descriptive, we outline the main issues involved in the data acquisition process of the scheme.

The FT-DDNN scheme assumes the UAV knows a priori the positions of sensor nodes. For convenience, we will refer to the data acquisition scenarios of the DDNN and the FT-DDNN schemes as shown in Figs. 2 and 3, respectively. We call the UAV is in contact with a node when it is within the radio transmission range r of the node. Since the UAV and the node cannot communicate unless they are in contact, we define discovery as the process which allows the UAV to detect a contact with the node and receive data from it.

Fig. 2
figure 2

Representation for UAV relay WSN data acquisition scenario in the DDNN scheme

Fig. 3
figure 3

Representation for UAV relay WSN data acquisition scenario in the FT-DDNN scheme

Reducing the tour length of the UAV using the DDNN scheme reduces the data acquisition latency [12]. However, it affects the system performance as the UAV may not discover a nearby node at its turning point (on the transmission boundary of the node) during its visit due to link or node faults. Data Acquisition will not be performed unless the UAV detects (recovers) the underlying sensor nodes. The probability of discovering a node by the UAV can be improved by increasing the contact time until the node is detected (recovered from failure) as shown in Fig. 3. The basic idea is that if the UAV doesn’t detect the node at its turning point of the DDNN scheme due to some error, it crosses the transmission range boundary directly towards the center of the node (instead of turning its direction to the next node) to get a longer effective contact time thereby allowing the UAV to have the opportunity to recover the error and discover (successfully gather data) from the node. We assume that transient faults occur in sensor nodes or wireless links between UAV and sensor nodes. Once the data is collected from the current node, the UAV turns its direction on its recovery point and moves towards the next nearest node of the current node. The UAV follows the same procedure until it visits all the nodes in the network. By doing so, data detection rate could be increased at the cost of minimum tour length. Figure 4 shows an instance of UAV route based on the FT-DDNN scheme where the dotted and the solid lines represent the paths in the NN and FT-DDNN algorithms, respectively.

Fig. 4
figure 4

The FT-DDNN approach

For transient error and recovery times, we use a random number generated by an exponential distribution. Regardless of the fault type, the rate at which faults occur can be quantified by using probability distributions, and more specifically, the exponential probability distributions. For reliability modeling purposes, electronic components are generally assumed to fail according to the exponential distribution [32]. If a random node or link failure occurs, a recovery mechanism has to be performed by the UAV in order to transmit data from the sensor to the sinker with near the minimum tour cost. To achieve this goal and verify by analysis, the FT-DDNN scheme uses a random number generated by an exponential distribution to let the UAV move directly towards the center of the node until it recovers the error so as to receive the data collected by the node.

The probability density function for the length of time between events with parameter λ in an exponential process is:

$$\Pr \left( {X = x} \right) = f\left( x \right) = \lambda e^{ - \lambda x} ,\quad for\, x > 0$$

Continuous failure rate depends on a failure distribution, which is a cumulative distribution function that describes the probability of failure prior to time t. The cumulative distribution function for an exponential process is the integral of the failure density function, f(x).

$$\Pr \left( {X \le x} \right) = F\left( t \right) = \int_{0}^{t} {f\left( x \right)\,dx,} \quad for\, x > 0$$

This exponential failure distribution function with parameter λ, which is based on the exponential density function, can be used to model the failure distribution.

$$F\left( t \right) = \int_{0}^{t} {\lambda e^{ - \lambda x} \,dx} = 1 - e^{ - \lambda t} ,\quad for x > 0$$

The inverse transformation method can be used to generate random variables from this exponential failure distribution function \(F\left( t \right)\) as follows.

Suppose \(y = F\left( t \right) = 1 - e^{ - \lambda t}\). Then, we can solve for t as follows:

$$\begin{aligned} y & = 1 - e^{ - \lambda t} \\ e^{ - \lambda t} & = 1 - y \\ - \lambda t & = \log \left( {1 - y} \right) \\ t & = - \frac{1}{\lambda }\log \left( {1 - y} \right) \\ \end{aligned}$$

Since \(1 - y\) is also uniformly distributed over the interval (0, 1), it is possible to generate exponential random variables with parameter λ using the transformation as follows:

$$T = - \frac{1}{\lambda }\log \left( Y \right)$$

The error recovery times are also assumed exponentially distributed, although the parameters of the distributions for both the error and recovery times are different. Therefore, we use the random numbers generated by an exponential distribution for error and recovery times for our proposed algorithm (FT-DDNN) described in Table 1.

Table 1 The FT-DDNN algorithm

4 Performance Analysis of FT-DDNN

In this section we analyze the performance of the FT-DDNN in terms of the distance covered by the UAV and the detection rate. To make the analysis more descriptive, we outline the main issues involved in the data acquisition process of the FT-DDNN scheme using Fig. 5.

Fig. 5
figure 5

Performance analysis of FT-DDNN

Assuming that the origin of the current node with transmission range of r units lays on the X-axis, we say that the relative position of the next nearest node is at an angle θ of the current node from the X-axis, where \(\theta\) is an angle formed by the line joining the origin of the two nodes and the X-axis. If the distance between the current node to be visited and the next nearest node, \(d\left( {P,V} \right)\) is \(g\) units, then, the distances \(d\left( {P, U} \right)\) and \(d\left( {U,V} \right)\) will be \(g\,\cos \theta\) and \(g\,\sin \theta\) units respectively.

Suppose the UAV travels directly towards the current node and reaches the boundary of the current node at point Q. There are two cases at this point depending on whether the current node is discovered (recovered) by the UAV or not. The UAV may discover (recover), or may fail to detect the underlying sensor node. The first case happens with a probability of \(\frac{\mu }{\lambda + \mu }\) if the UAV discovers (recovers) the underlying node at point Q and receives data from it. If the first case happens, the underlined node is marked as detected and the UAV turns its direction at point Q and moves directly towards the transmission boundary of the next nearest node to point T a distance of d(Q, T) which is given by

$$\sqrt {\left( {r + g\,\cos \theta } \right)^{2} + \left( {g\,\sin \theta } \right)^{2} } - r$$

Otherwise, the second case happens with a probability of \(\frac{\lambda }{\lambda + \mu }\) if the UAV fails to detect the underlying node at point Q. In this case, the UAV crosses the transmission range boundary directly towards the center of the node (instead of turning its direction to the next node) to get a longer effective contact time thereby allowing the UAV to have the opportunity to recover the error and discover (successfully gather data) from the current node. The second case by itself has three subcases. Each of the three subcases of the second case is described as follows. The first subcase happens with a probability of \(\frac{\lambda }{\lambda + \mu }\left( {1 - e^{ - \mu r} } \right)\) if the UAV discovers (recovers) the underlying node at a certain random point (say at point M) along the path in between the transmission boundary of the current node at point Q and the origin of the current node at point P. If this first subcase happens, the current node is marked as detected and the UAV turns its direction at the point of detection (point M) and moves directly towards the transmission boundary of the next nearest node to point T′. Following this path, the UAV covers a total distance of \(d\left( {Q, M} \right) + d\left( {M, T^{\prime } } \right)\) which is given by:

$${\rm E}\left( r \right) + \sqrt {\left( {r - {\rm E}\left( r \right) + g\,\cos \theta } \right)^{2} + \left( {g\,\sin \theta } \right)^{2} } - r$$

where

$${\rm E}\left( r \right) = \frac{1}{\mu } - \frac{{re^{ - \mu r} }}{{1 - e^{ - \mu r} }}$$

If the UAV fails to detect the current node until it reaches to its origin, either of the second or the third subcases will follow. The second subcase happens with a probability of \(\frac{\lambda }{\lambda + \mu }\left( {e^{ - \mu r} \left( {1 - e^{ - \mu r} } \right)} \right)\) if the UAV discovers (recovers) the underlying node at certain random point in between the origin of the current node (point P) and the point of departure of the UAV from the current node (point O). If the second subcase happens, the current node is marked as detected, and the total distance covered by the UAV from Q to P, and then to T″ will be:

$$d\left( {Q,P} \right) + d\left( {P,T^{\prime \prime } } \right) = {\text{r}} + {\text{g}} - r = g$$

The third subcase happens with a probability of: \(\frac{\lambda }{\lambda + \mu }e^{ - \mu r} \left( {e^{ - \mu r} } \right) = \frac{\lambda }{\lambda + \mu }\left( {e^{ - 2\mu r} } \right)\) if the UAV fails to detect the underlying sensor node along its path until its point of departure (point O). If the third subcase happens, the current node is marked as missed, and the total distance covered by the UAV from point Q to the boundary of the next nearest node T″ across the origin (point P) is the same as subcase 2. The performance of the NN, DDNN and FT-DDNN schemes in terms of the distance covered by the UAV and the detection rate are stated in Table 2.

Table 2 Distance covered by the UAV and detection rate of the schemes

5 Performance Evaluation and Results

In this section, we evaluate the performance of the NN, DDNN and FT-DDNN algorithms in terms of the distance covered by the UAV and the detection rate by varying the different parametric values of \(r, g, \theta , \mu \,and\, \lambda\).

5.1 Comparison of Distance Covered by the UAV Among Each Scheme

Figures 6, 7 and 8 illustrate the performance comparisons in terms the distance covered by the UAV to detect nodes based on the NN, DDNN and FT-DDNN algorithms. Both figures suggest that the proposed scheme shortens the tour length found by the previous NN algorithm, and thus improves the data acquisition latency. The reason for this is that, in the FT-DDNN scheme, once the data is collected from the current node, the UAV turns its direction and moves towards the next nearest node instead of going to the center of the current node. Keeping the error rate λ = 0.1 constant, the impacts of the recovery rate μ on the tour length is also reflected when the value of the recovery rate is incremented from each figure of (a)–(b). It is clearly observed that the tour length based on the FT-DDNN algorithm becomes closer to that of the DDNN algorithm with relatively higher recovery rate. This indicates that the tour length of our proposed algorithm can be improved farther by incrementing the recovery rate.

Fig. 6
figure 6

Distance covered versus transmission range of nodes (r)

Fig. 7
figure 7

Distance covered versus θ

Fig. 8
figure 8

Distance covered versus recovery rate (μ): (with λ = 0.1, \(g\) = 50, r = 20, θ = 30)

5.2 Comparison of Detection Rate Among Each Scheme

In Figs. 9, 10 and 11, we analyzed the detection rate of the proposed algorithm with respect to the NN and the DDNN algorithms in terms of r, θ and μ. Each figure suggests that the detection rate of the proposed scheme (FT-DDNN) is better than that of the DDNN scheme. In addition to that, the detection rate found by the FD-DDNN and the NN approach do not vary with regard to r, θ and μ as the FD-DDNN allows the UAV to go to the center of the node (just like the NN scheme does) if a node is not detected at the boundary of its transmission range. Therefore, as compared to the DDNN scheme, the FT-DDNN scheme increases the number of detected nodes as much as the NN scheme does and maintains the continuity of the network with minimum tour length costs. The reason for this is that, when there is a node which is unreachable by the UAV at the boundary of its transmission range due to fault device or link error, the UAV crosses the transmission range boundary directly towards the center of the node to have the opportunity to recover the error and successfully gather the data from the node before it gets turned to the next nearest node. Hence, the performance analysis demonstrates that the proposed algorithm enhances the DDNN approach in terms of fault tolerant ability. Furthermore, the effects of the recovery rate μ in each figure indicates that, increasing the recovery rate (For instance, from μ = 0.2 of Fig. 9a to μ = 0.5 of Fig. 9b) increases the detection rate in each scheme.

Fig. 9
figure 9

Detection rate versus transmission range of nodes (r)

Fig. 10
figure 10

Detection rate versus θ

Fig. 11
figure 11

Detection rate versus recovery rate (μ)

The FT-DDNN algorithm increases the detection rate at the cost of a little bit more route length as compared to the DDNN algorithm. It should be noted that there is a trade-off between fault tolerance and route length in the DDNN and FT-DDNN algorithms.

6 Conclusion

The FT-DDNN algorithm aims to enhance the fault tolerance capability of the network while considering the reduction of data acquisition latency of the UAV. In the DDNN scheme, the UAV may lose communications with the faulty or blocked sensors at a distance from its traveling path. In this study, we proposed efficient FT-DDNN scheme that produces shorter paths which enables faster data delivery with fault tolerant capability. The proposed algorithm enables the UAV to visit each node with the objective of increasing the detection rate and reducing the data delivery latency. Compared with the DDNN algorithm, the proposed approach increases detection rate at the cost of a little bit more tour length. It also outperforms the NN algorithm in terms of reducing the data delivery latency. The reason for this is that, in the FT-DDNN scheme, once the data is collected from the current node, the UAV turns its direction and moves towards the next nearest node instead of going to the center of the current node. The performance analyses have demonstrated that our proposed scheme improves the detection rate of the DDNN scheme up to 34.93% at the cost of a little bit distance.