1 Introduction

The need to transport the high-demand products of oil and gas from a production point to an area of supply has led to the usage of pipeline infrastructure. These pipelines, which can either be deployed overground or underground, are specifically designed and manufactured to carry these usually toxic and dangerous products through vast areas of a few hundred metres to several thousand kilometres. The areas are often close to hubs of high population or of high environmental sensitivity. With such risks, methods have to be sought to preserve these lines and maintain their condition at a good level [1, 2]. Furthermore, with the intentional tampering of pipeline infrastructure on the increase, through terrorism and vandalism, the need for a reliable pipeline monitoring system is highly significant [3, 4].

Wireless sensor networks (WSNs) are just one of those technologies used in the remote monitoring of pipeline infrastructure [5,6,7]. With the deployed nodes, a sensor network can check the infrastructure condition for corrosion and most importantly leaks, which are essential occurrences to determine intentional tampering or a natural phenomenon due to degrading or ruptured equipment. Compared with other technologies that use a wired form of communication, the deployment of sensor networks that communicate wirelessly for pipeline monitoring offers several advantages, which include ease of deployment, real-time and reliable data capturing, concealment and better coverage density [8, 9].

This paper focuses on the leak detection performance of WSNs when deployed on an overground pipeline and an underground pipeline. The investigation is through a statistical analysis and evaluation by the maximum likelihood ratio (MLR) technique used in the detection theory. The reason for approaching the MLR scheme in the analytical evaluation of the detection plan is the fact that the MLR is a simpler and more straightforward approach than other estimations such as those using the Bayesian criteria [10]. Through this paper, it would be shown that statistically the MLR criterion needs only a prior distribution model to turn the detection or estimation problem into an analytical optimization problem, which is the focus of this paper. Additionally, under the general assumption that leak occurrence comes with a large number of observed measurements, the MLR scheme is seen to achieve the smallest possible variance of any estimator. In other words, it is the best guaranteed estimator that will achieve the most optimum parameter estimation in the detection problem. Therefore, the single focus on the MLR scheme to evaluate detection performance is an apparent approach [11].

The need for such a study is apparent. The occurrence of leaks leads to environmental pollution and a dangerous hazard. Additionally, these leaks are also an economic problem with millions of tonnes of oil and billions of m3 of gas lost each year, which amount to about 7% of the total transported oil or gas [12]. Early leak detection is therefore of paramount importance and over the years, WSNs are being used for just this in overground and, more recently, underground pipelines [13,14,15,16,17]. This study attempts to provide an insight on the mathematical approach of leak detection performance with WSNs to establish the parameters required for performance. It is important to note that this paper is entirely conceptual and aims to use theory to bring up a conjecture. The approach is for the two settings of overground and underground pipelines, which are distinguished through their wireless communication channel.

2 Related works

Works on pipeline leak detection with WSNs have been pursued at different levels. Magnetic Induction (MI)-based WSN for underground pipeline monitoring, also known as MISE-PIPE, is pursued by Sun et al [18]. Through the study it is discovered that the MISE-PIPE scheme involves sensors deployed both inside and outside the pipeline, which include pressure sensors, acoustic sensors and soil property sensors that cooperatively work together to detect and localize a pipeline leak. The MI-based WSN comprises coils connected to the various sensors that are systematically placed along the underground pipeline structure to provide the sensors with efficient and robust wireless communication among themselves. Sun et al [18] provide a methodology and theory in deploying the MISE-PIPE idea but do not really provide the detection scheme or detection approach in detecting a leak. To be precise, Sun et al [18] focus on the design architecture of the MISE-PIPE rather than on the actual detection performance analysis.

Reference [19] is a research manuscript on WSNs for pipeline water loss detection. Though not entirely focusing on pipelines of the oil and gas industry, Christodoulou et al [19] offer a framework on the WATERSENSE project, whose aim is to receive sensor signals of a deployed WSN on a water pipeline for the purpose of a leakage decision support and management system. They focus on the development and implementation of the WATERSENSE project but again do not provide an actual conceptual idea on the detection performance analysis, which can be important in improving the operational performance of the deployed system.

Nawaz and Rehman [20] provide insight into the different leakage problems that may arise in various pipeline industries and how WSN is used as an applied solution to monitor the pipeline leakage. This reference is more of a general paper that puts some of the WSN leakage detection techniques into one manuscript and aims to briefly describe and compare their functionalities. The study provides good insight on the technologies of pipeline monitoring using WSNs but yet again fails to provide insight on the actual detection approach.

Boaz et al [21] focus on wireless sensor nodes for gas pipeline leak detection and localization. They discuss the theories and environmental constraints for wireless sensor nodes and its usage in the detection of sound excited by jetting gas indicating a gas pipeline leakage, and about its frequency. This work is precise and focuses on the environmental characteristics of gas in air for sensor detection in depth. An architecture was even proposed and designed using various components such as the XBee RF module and a microcontroller, whose results were produced by simulation. However, just like the previous works, [21] fails to deliver on the performance analysis of detection. This is also the case for references [22,23,24,25]; all of them focus on architecture and implementation.

This paper, unlike the others, attempts to provide an actual detection scheme that provides insight on the detection performance analysis of pipeline leakage. This study and application of discussed concepts can enable designing of systems to improve performance. The work also provides a comparison of detection between underground and overground pipelines, which are the two most important scenarios pipelines deployed in industries. The entire work of this paper is based on the MLR scheme, which, as previously mentioned, is the best guaranteed estimator that will achieve the most optimum parameter estimation since, after all, leak detection is a decision estimation process.

3 Overview of study

Consider the scenario of figure 1, where acoustic omni-directional sensor nodes operating at narrowband frequencies are deployed in cross sectional groups across a pipeline.

Figure 1
figure 1

Pipeline structure with sensors.

Figure 2
figure 2

Leak occurrence.

Each group makes up a WSN and has a maximum of four sensors covering each cylinder section of the pipeline. A Base Station (BS), also a sensor device, is deployed at the top of the structure to act as a hub where all information from the sensors goes to. The hub can be placed at the very end of the pipeline or at intermediate stops between groups of sensors. The sensors are connected with a strip that also has sensing abilities but with very limited capabilities.

Practically, these WSNs consist of MICAz motes with acoustic and pressure sensing capabilities. The outdoor range of these sensor devices would be about 100 m [26]. Therefore, the optimal distance between the sensor strips would also be 100 m. Hence, for a 1000-km pipeline, for example, the structure would need precisely 10,000 strips each equipped with four motes. Assuming that each mote costs $100, the default deployment costs of the WSNs on the pipeline infrastructure would be $4 million minus the BS sensor nodes. This $4 million, compared with other schemes such as the Fibre optic leak detection cable, which costs around $8 million [27], is a cheaper approach. Fundamentally, the deployment costs of the WSNs would depend on the sensing range capability of the sensor devices. Therefore, an improved sensing range would drastically decrease costs.

When a leak occurs, as shown in figure 2, the sensor nodes of each group detect this leak, make a decision on whether the leak occurred or not and send the information wirelessly to the corresponding nearest BS of the group for a further and final decision on the existence of the leak. A sensor device decides to detect the leak by estimating the approximate distance between it and the occurrence depending on its coverage capability. If within coverage, detection is carried out with the BS making its own decision based on multiple decisions from the detecting sensors. With this scenario played out, the WSN detection system is visualized as shown in figure 3 [28].

Figure 3
figure 3

WSN detection system.

In figure 3, K is the number of sensors (S), i represents the individual sensor number (i.e., S1, S2, S3, …, S i , …, K), A i represents the acoustic signal from the leak at sensor i (i.e., A1, A2, A3, …, A i , …, K) and f(x i ) is the function of measurements that is sent to the BS if the sensor decides that a leak is present. Fundamentally, a sensor makes this decision based on its noise-degraded measurements whereas the BS makes a decision based on multiple decision functions of measurements from the sensors.

4 Detection testing synthesis

In this section of the paper, a testing problem is formulated to provide insight on the parameters involved with the detection scheme. It is through this preparation that further evaluation is made on the detection performance of WSNs for pipeline leaks.

4.1 Testing problem

The signal picked up by the sensor nodes, in accordance with figure 3, is the effective sound pressure produced by the leak under the surrounding background noise. To be precise, the occurred leak emits an acoustic sound signal that alters the normal state of background noise. The aim is to detect the sound signal amidst noise; since this is random and undefined with uncharacterized possibilities, investigating whether the leak is present or not is a hypothesis testing problem [28,29,30].

Statistically, to carry out the detection testing involves taking measurements and through those measurements a decision is made on the occurrence of the leak. Thus, there are two probable outcomes, which are

  • whether a leak is absent amidst the background noise measurements, defined as the zero hypothesis H0.

  • whether a leak is present amidst the background noise measurements, defined as the alternate hypothesis H1.

To investigate the two possibilities, also referred to as the binary testing procedure, the sound measurements are first considered to be modelled as a continuous random variable x. This random variable is a function that maps the random occurrence of the leak to an infinite number of values and a good representation in describing the endless unpredictable nature of noise. Under the hypothesis H0, the testing measurements are modelled as the acoustic background noise that affects the sound pressure measurement of the leak. They consist of sounds of surroundings where the pipelines are deployed, which includes a sum of many sources of sound noise. For simplicity and due to the central limit theorem, which states that the sum of several independent random variables tends to a normal distribution as the number of random variables in the sum grows to infinity [28, 31], the background noise is approximated by a mean square of sets of random variables that follow normal distribution. Thus, without the occurrence of the leak, sound measurements and under the null hypothesis H0; the acoustic background noise is modelled by a normal distribution N with mean µ o and variance σ 2:

$$ x = N(\mu_{o} ,\sigma^{2} )\quad Under \, H_{0} $$
(1)

On the other hand, when a leak occurs this assumption is presumed under the alternative hypothesis H1. Here, the normal distribution of the background noise is excited to produce a different normal distribution under H1. This excitation includes the effects of H0 and an unknown mean shift value A, depending on the loudness of the leak. Therefore, the new and alternate sound measurement under H1 is modelled as follows:

$$ x = N(\mu_{o} + A,\sigma^{2} )\quad Under \, H_{1} $$
(2)

It is important to note that the mean shift here is always a positive value because it is assumed that the A value is an additional parameter to the normal state of noise. In other words, A is an added occurrence and therefore an added shift that can only be positive is included to fit into the intuition of an occurred leak amidst noise. Equations (1) and (2) are considered to be the probability density functions (PDFs) that characterize the probable behaviour of the sound measurements under each hypothesis, hence simplifying the subsequent test.

Given this set-up, the performance characterization of the binary test is approached through different probabilities, namely, the probability of detection, the probability of false alarm and the probability of miss. The following are their definitions [32]:

  • The probability of detection (P D) is where H1 is true it is decided, denoted as prob{H 1 true, decide H 1 } = P D.

  • The probability of false alarm (P FA) is where H1 is decided but H0 is actually true, expressed as prob{H 0 true, decide H 1 } = P FA.

  • The probability of miss (P M) is where H0 is decided but H1 is actually true, represented as prob{H 1 true, decide H 0 } = P M.

Note that the probability of detection can also be expressed as follows: P D  = 1 − P M, and additionally, the outcome where H0 is decided and is true can be expressed as follows: prob{H 0 true, decide H 0 } = 1 − P FA.

The two most important probabilities that characterize the performance of the testing are P D and P FA, where the ultimate goal is make the correct decision given the scenario. In deciding which one is true between the two probabilities, the sound measurements are observed to see whether they lie underneath the PDF of P FA or the PDF of P D. This essentially means the separate integration of the two probabilities with their corresponding PDFs.

Let us assume that RO indicates the region where the measurements lie under the PDF of P FA, and R1 describes the region for the PDF of P D. RO and R1 are defined using the function of distribution and a threshold. Assume that T(x) is the function and \( \gamma \) is the threshold. Then, if T(x) is greater than \( \gamma , \) it is decided that R1 is true and the measurements lie in the region R1. Alternatively, if T(x) is less than \( \gamma , \) then the measurements lie in R0. This is intertwiningly linked to the binary test, where the corresponding decisions of R0 and R1 also lead to the decisions of H0 and H1, respectively, and expressed as follows:

$$ T\left( x \right) > \gamma = R_{1} ,H_{1} $$
(3)
$$ T\left( x \right) < \gamma = R_{0} ,H_{0} . $$
(4)

The entire testing problem is therefore illustrated as shown in figure 4 [32],

Figure 4
figure 4

P D and P FA illustration for a normal distribution of measurement x.

where \( f_{{H_{0} }} (T) = {\text{PDF}} \) of T(x) when H0 is true and \( f_{{H_{1} }} (T) = {\text{PDF}} \) of T(x) when H1 is true. In essence, P FA and P D are as follows:

$$ P_{FA} = \mathop \int \limits_{\gamma }^{\infty } f_{{H_{0} }} \left( T \right)dT, $$
(5)
$$ P_{D} = \mathop \int \limits_{\gamma }^{\infty } f_{{H_{1} }} \left( T \right)dT. $$
(6)

Putting P FA and P D together provides a conjecture on the receiver performance of the sensor in detecting the leak. To be precise, what is observed is the variation of P D and P FA with respect to a changing \( \gamma \). The next step to the testing is finding the test procedure T(x) that provides optimal performance in the leak detection ability.

4.2 Likelihood ratio test and maximum likelihood detection

In providing an optimal test procedure T(x), the known PDFs of H0 and H1 are further analysed. It is understood that a good performance detector is one where P D is as high as possible and P FA is as low as possible. This means the sensor detector has to maximize P D for a given P FA. To achieve this, the Neyman Pearson (NP) test is used, where the measurements x are observed below a ratio of the PDF of H1 to that of H0 and and plugged into Eqs. (3) and (4). The new equations give rise to the Likelihood ratio L(x), which enables the test to maximize P D for a given P FA expressed as follows [33]:

$$ L\left( x \right) = \frac{{f_{{H_{1} }} (x)}}{{f_{{H_{0} }} (x)}} > \gamma \quad {\text{to}}\;{\text{decide}}\;{\text{that}}\;{\text{H}}_{1} \;{\text{is}}\;{\text{true}} $$
(7)
$$ L\left( x \right) = \frac{{f_{{H_{1} }} (x)}}{{f_{{H_{0} }} (x)}} < \gamma \quad {\text{to}}\;{\text{decide}}\;{\text{that}}\;{\text{H}}_{0} \;{\text{is}}\;{\text{true}} $$
(8)

In summary, Eqs. (7) and (8) say that a comparison is made between the likelihood of observing a value x given that H1 is true to the likelihood of observing a value x assuming H0 is true. The aim based on Eq. (7) is to get a bigger \( f_{{H_{1} }} (x) \) for the purpose of choosing H1. A bigger \( f_{{H_{1} }} (x) \) will mathematically maximize P D for a given P FA; hence the MLR test is introduced through L max(x), where all attempts are made to maximize \( f_{{H_{1} }} (x) \) by estimating unknown parameters [34].

$$ L_{max} (x) = \frac{{{ \hbox{max} }(f_{{H_{1} }} (x))}}{{f_{{H_{0} }} (x)}}. $$
(9)

5 Detection testing application with WSNs

Given the scenario of figure 3 in section 3, the detection system follows a decentralized scheme where sensor nodes detect a leak, make a decision on whether a leak is present or not and sends the relevant information to a BS for further decision evaluation on the presence or absence of the leak. Thus, the testing and evaluating scheme has two rules to follow:

  1. 1.

    sensor decision rule

  2. 2.

    BS decision rule

Both rules are tested with the MLR test in evaluating the detection performance of a leak.

5.1 Sensors’ decision rules

In accordance with figure 3 and in relation to the derived synthesis in section 4, the MLR test is computed for each sensor i. It is assumed that q samples of measurements are taken per distribution for each sensor i, where q = 1, …, Q. Therefore, Eqs. (1) and (2) are modified to the following single equation [28]:

$$ x_{i} [q] = \left\{ {\begin{array}{*{20}l} {N\left( {\mu_{o,i} ,\sigma_{i}^{2} } \right) } \hfill & {Under\;H_{0} } \hfill \\ {N\left( {\mu_{o,i} + A_{i} ,\sigma_{i}^{2} } \right)} \hfill & {Under\;H_{1} } \hfill \\ \end{array} } \right. $$
(10)

where u o,i and \( \sigma_{i}^{2} \) can be calculated as follows:

$$ \mu_{o,i} = \frac{1}{Q}\mathop \sum \limits_{q = 1}^{Q} x[q], $$
(11)
$$ \sigma_{i}^{2} = \frac{1}{Q - 1}\mathop \sum \limits_{q = 1}^{Q} (x\left[ q \right] - \mu_{o,i} )^{2} . $$
(12)

Since the measurements follow a normal distribution, the PDF equations defined for each hypothesis are expressed as follows:

$$ f_{{H_{0} }} (x_{i} ) \approx N\left( {\mu_{o,i} ,\sigma_{i}^{2} } \right) = \frac{1}{{(2\pi \sigma_{i}^{2} )^{{\frac{Q}{2}}} }}exp\left\{ { - \frac{1}{{2\sigma_{i}^{2} }}\mathop \sum \limits_{q = 1}^{Q} (x_{i} \left[ q \right] - \mu_{o,i} )^{2} } \right\} $$
(13)
$$ f_{{H_{1} }} (x_{i} ) \approx N\left( {\mu_{o,i} + {\text{A}}_{i} ,\sigma_{i}^{2} } \right) = \frac{1}{{(2\pi \sigma_{i}^{2} )^{{\frac{Q}{2}}} }}exp\left\{ { - \frac{1}{{2\sigma_{i}^{2} }}\mathop \sum \limits_{q = 1}^{Q} (x_{i} \left[ q \right] - (\mu_{o,i} + A_{i} ))^{2} } \right\} $$
(14)

With the defined PDFs of (13) and (14), the likelihood ratio test for each sensor is expressed as follows:

$$ L_{i} \left( x \right) = \frac{{f_{{H_{1} }} (x_{i} )}}{{f_{{H_{0} }} (x_{i} )}} > \gamma_{i} $$
(15)

MLR of (15) requires maximizing (14); if \( \mu_{1,i} = \mu_{0,i} + A_{i} ,\sigma_{i} \) then (14) is modified to

$$ f_{{H_{1} }} (x_{i} ) = \frac{1}{{(2\pi \sigma_{i}^{2} )^{{\frac{Q}{2}}} }}{\rm{exp}}\left\{ { - \frac{1}{{2\sigma_{i}^{2} }}\mathop \sum \limits_{q = 1}^{Q} (x_{i} \left[ q \right] - \mu_{1,i} )^{2} } \right\} $$
(16)

It is very common with exponential densities to simplify things by taking the natural logarithm. This is because the natural logarithm is a function that essentially does not change the characteristics for a given order [28]. Thus, the natural log of (16) is

$$ { \ln }(f_{{H_{1} }} (x_{i} )) = - { \ln }(\left( {2\pi \sigma_{i}^{2} )^{{\frac{Q}{2}}} } \right) - \frac{1}{{2\sigma_{i}^{2} }}\mathop \sum \limits_{q = 1}^{Q} (x_{i} \left[ q \right] - \mu_{1,i} )^{2} . $$
(17)

The equation to be maximized is (17) and it is known that the extrema of any function is where the derivative equals zero. Therefore, taking the derivative of (17) with respect to \( \mu_{1,i} \) gives

$$ \frac{{\partial { \ln }(f_{{H_{1} }} (x_{i} ))}}{{\partial \mu_{1,i} }} = \frac{1}{{\sigma_{i}^{2} }}\mathop \sum \limits_{q = 1}^{Q} (x_{i} \left[ q \right] - \mu_{1,i} ) = \frac{1}{{\sigma_{i}^{2} }}\mathop \sum \limits_{q = 1}^{Q} (x_{i} \left[ q \right] - {\text{Q}}\mu_{1,i} ) $$
(18)

Equating equation (18) to zero and solving for \( \mu_{1,i,\hbox{max} } \) yields

$$ \mu_{1,i,max} = \frac{1}{\text{Q}}\mathop \sum \limits_{q = 1}^{Q} x_{i} \left[ q \right]. $$
(19)

Thus, replacing \( \mu_{1,i,\hbox{max} } \) of (19) with its equivalent in (16) maximizes the likelihood ratio of (15), defining the MLR. In perspective, if a sensor decides that a leak is present, it sends its estimated MLR of (15) to the BS.

In addition to maximizing (15), further clarifications are pursued as regards the decision rules of the sensor in detection. A threshold, \( \rho_{i} , \) is introduced to replace the assumption of Eq. (8) given that H0 is true. Therefore, a combination of this threshold with that of H1 yields the final decision rule for the sensors given as follows:

$$ \varPhi_{\text{i}} \left( {x_{i} } \right) = \left\{ {\begin{array}{*{20}l} {L_{i} \left( {x_{i} } \right)} \hfill & {{\rm{if}}\quad L_{i} \left( {x_{i} } \right) \ge \gamma_{i} } \hfill \\ {\uprho_{\text{i}} } \hfill & {{\rm{if}}\quad L_{i} \left( {x_{i} } \right) \gamma_{i} } \hfill \\ \end{array} ,} \right. $$
(20)

where \( \rho_{i} \) and \( \gamma_{i} \) are thresholds that are set.

5.2 BS’s decision rules

At the BS, all decisions are received from the sensors depending on Eq. (20). The job of the BS is to decide whether a leak occurred or not depending on the product of the sensors’ information. Again, if K is the number of sensors, the decision rule at the BS is formulated as follows:

$$ \varPhi_{0} \left( {\varPhi_{1} , \ldots \varPhi_{K} } \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\rm{if}}\quad \prod\limits_{i = 1}^{k} {\varPhi_{i} (x_{i} ) \ge \gamma_{0} } } \hfill \\ 0 \hfill & {\rm{otherwise}} \hfill \\ \end{array} } \right. $$
(21)

where 1 is when leak is present and 0 is when leak is absent; \( \gamma_{o} \) is the set threshold.

According to Koskiahde [28], a reasonable \( \gamma_{o} \) is calculated as follows:

$$ \gamma_{0} = \alpha \rho^{K} $$
(22)

where \( \sigma \) is a factor that depends on:

  • the number of sensors in the network,

  • the PFA of the sensors and

  • the PFA of the BS.

6 Leak sound and communication channel characteristics

As reiterated, when a leak occurs it emits an acoustic energy that propagates both inside and on the outer surface of the pipeline. This energy gives off a sound pressure, which can be measured and analysed. The sound pressure is converted to sound power and depending on the distance from the leak, the sound pressure level at the sensor node can be calculated as follows [35,36,37]:

$$ L_{P} = L_{W} - A_{div} $$
(23)

where \( L_{W} \) is the sound power level relative to a 1 pW reference sound power and \( A_{div} \) is attenuation due to geometrical divergence given by Eq. [28]

$$ A_{div} = 20{\rm{log}}_{10} \left( {\frac{d}{{d_{0} }}} \right) + 11. $$
(24)

In Eq. (24), d is the distance the sound propagates and d o is the reference distance, d o = 1 m. Thus, the sound pressure level L p [dB] at a sensor is

$$ L_{P} = L_{W} - 20{\rm{log}}_{10} \left( {\frac{d}{{d_{0} }}} \right) - 11. $$
(25)

Equation (25) gives the measure of L p in terms of distance and sound power level. The relative measure of L p [dB], however, which is used to determine the pressure level in terms of a reference power level is given by [38]

$$ L_{P} = 20{\rm{log}}_{10} \left( {\frac{{P_{rms} }}{{P_{ref} }}} \right) $$
(26)

where P rms is a given sound power in Pascals (Pa) and P ref is the reference level. P ref is usually given as \( 20\;\upmu{\text{Pa}}\left( {\text{RMS}} \right) = 2 \times 10^{ - 5} \;{\text{Pa,}} \) which is the typical value for hearing in air.

To distinguish between the underground and overground settings for the performance of sound pressure level detection, it is assumed that an underground WSN deployment takes into consideration the presence of soil or rock that hampers performance. It is also assumed that overground there is no such hampering. Therefore, with this consideration, the signal to noise ratio (SNR) would be different for the two settings, which also affects the probability of detection performance. The SNR is estimated by the following equation [12]:

$$ SNR = 10\log_{10} \left( {\frac{{L_{W} }}{\sigma }} \right)dB $$
(27)

where L W is the sound power in Watts or numeric representation, and \( \sigma \) is the noise variance. Using Eq. (27), the variance is adjusted to suit the conditions of the settings. Since it is assumed that underground setting has more signal hampering effect than overground, the variance for underground would be significantly higher than the variance of overground \( \left( {\sigma_{Under} > \sigma_{Over} } \right). \) To elaborate on this notion of variance, in the underground environment, signal propagation would be heavily affected and attenuated by the presence of soil and rock. Hence, measurements coming from the leak to the sensor devices or communication between the sensor devices and the BS would be less precise and less accurate. In terms of the variance of the PDF, a larger variance would indicate a less precise estimator, which consequently would mean a less accurate hypothesis testing [39]. This higher variance flattens the probability distribution curve more than that of the lower variance. Therefore, this decreases the SNR and decreases the probability of detection performance for the underground setting. The SNR percentage difference between the two settings is calculated as follows:

$$ SNR = \frac{{SNR_{Over} \left( {dB} \right) - SNR_{Under} \left( {dB} \right)}}{{SNR_{Over} \left( {dB} \right)}} \times 100\% . $$
(28)

Linked to the SNR is the packet loss probability for performance. According to [40], the probability of a packet lost in the wireless communication channel is

$$ P_{\gamma } \left( {packet\,lost} \right) = (\left( {1 - p} \right) + p(1 - \left( {1 - p)^{B - 1} } \right))^{S} $$
(29)

where B is the number of sensors in the network involved in the detection and S is time slots. The optimal probability (p) of sending data in one time slot is

$$ p = \frac{2}{K + 2} $$
(30)

where K is the total number of sensors in the network.

7 Evaluating leak detection performance with WSN

This evaluation follows the works of Koskiahde [28] and reference [41], where sensors were deployed and a leak occurred 172.5 m away with an acoustic power of 35 dB. Using Eq. (24), P rms was calculated to be 1 mPa. This value was used as the mean of the system, whose variance and SNR values were chosen to fit the settings of an underground and overground environment.

The leak detection with WSNs uses the MLR scheme, and the parameters presented in to analyse performance for the two settings of underground and overground are presented in table 1. The parameters of table 1 are of a general case and apply to all materials transported through the pipeline. The simulation procedure was as follows.

Table 1 Simulation parameters.

Sensor nodes and a leak occurrence were first placed in xy coordinates. It was assumed that both sensor and leak placement were within 1 m in diameter of pipeline structure. Hence, in a two-dimensional setting, the y-axis or height extends from 0 to 1 m, while x-axis extends from 0 to ∞ m. The distance of the leak from all placed sensors and sound pressure were then computed. The sensor range value is presented in table 2; a distance less than the range implies that the associated sensors had detected the leak determined by Eq. (25). Noise was added into the sensor computation using the following equation:

Table 2 Sensor parameters
$$ x = \mu + \sqrt \sigma *(random\,values). $$
(31)

To generate data of the sensors for the H0 and H1 criteria, the following condition was used:

$$ H_{0} = x\quad {\rm{and}}\quad H_{0} = x + L_{P} $$
(32)

where L p is the calculated sound pressure from leak to all associated sensors determined by range. Once H0 and H1 were sought, the maximum likelihood was then calculated using the derivations in section 5.1 (Eqs. (10)–(20)) based on a set threshold \( \rho . \) A wireless communication simulation for sensor to BS was then carried out using Eqs. (29) and (30) based on \( \rho . \) A centre fusion processing scheme at the BS was then established using Eqs. (21) and (22) with a general simulated result and insight into the number of packets dropped and the number of detections at the BS.

The simulated results and discussion are as follows.

A deployment of two sensors with a varying \( \rho \) in the detection of the leak was first simulated for the underground and overground settings. Figure 5 shows the result.

Figure 5
figure 5

Probability of detection vs. \( \rho \)at Base Station for two sensors.

From figure 5, it can be seen that as proven, an increasing \( \rho \) (threshold for the probability of false alarm) decreases the probability of detection, therefore hampering performance. However, this degradation is noticeably significant for an underground pipeline than an overground pipeline; in fact, about a 15% and greater difference for a high performing \( \rho \) (i.e. \( \rho \) = 1, 2, 3) and about a 5% and less difference for a low performing \( \rho \) (i.e., \( \rho \) = 6, 7, 8). This just proves that in order to achieve a good detector for the sensor\( \rho \) must be as low as possible. However, this low threshold comes with a cost as shown in figure 6.

Figure 6
figure 6

Total number of packets dropped at Base Station vs. \( \uprho \) for two sensors.

As observed from figure 6, a low \( \rho \) also means more packets dropped during wireless communication between sensor node and the BS. This again is more significant for the underground pipeline than overground, indicating that the presence of soil or rock significantly hampers communication and therefore also affects performance. Consequently, in the underground situation, there needs to be some kind of compromise in setting the threshold to achieve adequate detection and a low rate of dropping of packets.

A further insight is obtained in determining the other factor that can affect the performance of detection. The number of sensors in the WSN was investigated. Figure 7 shows performance through the number of sensors for the best observed threshold of \( \rho \) = 1.

Figure 7
figure 7

Probability of detection vs. number of sensors for \( \rho \) = 1.

As observed and as proven, the low threshold achieves 100% detection for all number of sensors in the WSN for the overground setting. However, for the underground, a considerable number of sensors will be needed to achieve 100%. This means, according to the default configuration of deployment, a total of two sensors, one on each (opposite) side of the pipeline will be enough to achieve 100% detection for an overground setting; whereas for an underground pipeline, two groups of four sensors, spaced out accordingly, will be required to achieve 100% detection. This indicates that an underground WSN deployment will be a much more expensive endeavour than an overground deployment.

In observing the worst possible scenario, \( \rho \) was set high at 10 and figure 8 shows the results.

Figure 8
figure 8

Probability of detection vs. number of sensors for \( \rho \) = 10.

It can be seen that a high threshold significantly hampers performance and a lot of sensors will be needed to achieve adequate detection. However, this is noticeably worse for an underground setting than an overground one.

A plot of the probability of detection against the total number of packets dropped given the number of sensors and threshold, \( \rho , \) was pursued as shown in figures 912.

Figure 9
figure 9

Total number of packets dropped vs. probability of detection at BS for \( \rho \) = 1 with two sensors.

Figure 10
figure 10

Total number of packets dropped vs. probability of detection at BS for \( \rho \) = 1 with four sensors.

Figure 11
figure 11

Total number of packets dropped vs. probability of detection at BS for \( \rho \) = 10 with two sensors.

Figure 12
figure 12

Total number of packets dropped vs. probability of detection at BS for \( \rho \) = 10 with four sensors.

As observed from the figures, the probability of detection has a linear relationship with the total number of packets dropped. To be precise, as the probability of detection increases so does the total number of packets dropped. Additionally, as the number of the sensors increases in the network so does the probability of detection and consequently the total number of packets dropped. In essence, the number of sensors is directly proportional to the detection capability and network communication performance of the WSN.

Applying these observations to the deployment scenario of underground and overground is the fact that the underground setting has more severe packet loss than overground as the probability of detection increases. This literally indicates that the detection performance in underground is weak compared with overground based on the set assumptions. Again, as proven in figure 6, threshold is a significant factor in the total number of packets dropped and hence the detection performance of the WSN. At \( \rho \) = 1 the total number of packets dropped is more than the packets dropped at a higher set threshold of 10. Equally, also proven in figure 6 is the fact that a higher threshold also means a lower probability of detection. Hence, in perspective, the threshold is inversely proportional to both the total number of packets dropped and the probability of detection.

In summary, to achieve adequate detection for the underground pipeline scenario, \( \rho \) must be low enough with a compromise consideration in the total number of packets dropped. Equally, the total number of sensors used for monitoring the pipeline must be more and spaced out at smaller distances between each other than an overground deployment.

8 Conclusion

This paper provides an analysis and evaluation on the leak detection performance for underground and overground pipelines. It was discovered that the threshold for the probability of false alarm is crucial in determining adequate detection of the source. Additionally, the number of deployed sensors is also another factor that affects performance. In relation to the surrounding environment of the underground and overground pipeline, it was discovered that leak detection performance is severely hampered in the underground situation because of the presence of soil and rock, which affects the noise variance and therefore decreases the probability of detection. For overground pipelines, because of the assumption that there is no presence of any such obstacle, detection performance is pretty much at a high level, with a low noise variance and hence a high probability of detection. From the study, it can be concurred that any kind of underground sensor deployment for detection would require more sensors and as low threshold as possible that also takes into consideration the number of packets dropped for adequate detection performance.