Keywords

1 Introduction

Classifying the type of a 2-phase flow is an important practice in different industries, such as oil, gas, petrochemical and food. This specifically helps the process control system to change parameters to control the occurrence of them or prevent some unwanted damages that some of these patterns may create [1, 2]. When it comes to identifying these patterns, there are usually two main techniques in use. The first one is basically done by analyzing the parameters, e.g., finding the evidence of the existence of a particular type of flow from the parameters such as velocity and pressure changes through the pipe. The second one more focus on the processing of the recorded sound of the flow. Understandably the parameter analysis method is not an automated technique and requires much more time for each instance. The latter method, however, falls into two major techniques: Putting the sound signal through the classification algorithms directly [23] or analyzing the image representation of the flow sound spectrogram via image processing techniques. Due to its cost-effectiveness, ease of implementation and vast variation of the image processing techniques, analyze of the spectrogram of the flow sound as an image seems more convenient method. The fact that this identification is made by image processing creates numerous possible methods for classification of flows. 2-phase flow classification is specified in two modes (1) a horizontal pipe and (2) a vertical pipe. In the cases that gravity is vertical to the pipe, phase separation occurs in a horizontal pipe. Some of the common 2-phase patterns in the industry are bubbly, Annular, plug, churn, stratified and slug flow [2].

The bubbly flow is defined as when a continuous stream of liquid exists with bubbles scattered through it. In the annular, liquid exists around the pipe's crust, and gas can be found around the center. The bubbles are unified in the plug flow pattern, and bigger bubbles can be found close to the pipe's wall. In cases of an oscillatory flow, we are dealing with Churn flow. The smoothness of the horizontal contact in a stratified pattern completely stratifies the liquid and gas. Slug flow is the other form, which is seen in the crude oil and tar production plants, particularly in pipelines conveying crude oil or liquid gas [3].

SVM and neural network feature extraction methods are the techniques used for 2-phase flow classification [4, 11]. The authors of this study [5] measure the performance of the airlift pump at various submergence ratios and compare the results to Hewitt and Roberts' flow pattern map [6]. Flow classification is represented using textural features and SVM classifier in [7]. Furthermore, a fuzzy neural network is proposed to identify the flow pattern of bubbly, slug and plug flow in [8]. However, according to the existing literature, flow characterization is facing some obstacles. Because these characterizations are dimensionally sensitive, they can only be used with the specific parameters of their research. Creating a universal flow pattern map for various liquids and pipe settings is a difficult undertaking. The fact that some flows may transit at a Weber number, and some might be justified by Reynolds number makes it almost impossible to come up with a universal dimensionless flow pattern map. Figure 1 depicts the occurrence of various 2-phase flow models in a horizontal pipe [9].

Fig. 1
figure 1

Horizontal two-phase flow patterns [9]

This paper presents flow type identification based on the audio signal classification to tackle the mentioned flow mapping problems. The hidden Markov model (HMM), which is a traditional speech-recognition technique and focuses on mel-frequency cepstral coefficients (MFCC), was used in the early research related to audio signal classification [10, 12]. Later on, other techniques like MPEG-7 audio features [13], matching pursuit features [14] and spectro-temporal signature [15] were developed as the supplement for MFCC. Techniques such as mel-frequency cepstrum [16], short-time discrete Fourier transform (STFT) and Gabor spectrogram [17] are widely used in audio signal classifications due to their ability to work with spectrums.

The spectrogram of the audio signal provides helpful information in sound classification [24]. New approaches investigate spectrogram as a textured image [18,19,20,21,22, 25] and use image processing techniques for audio signal classification. The spectrogram is a composite image that illustrates the distribution of a signal's intensity in multiple frequencies at different time intervals. It is not a standard image. By recording the intensity differences between a pixel and its neighbors, the local binary pattern (LBP) approach retrieves the spectrogram's micro-patterns.

The LBP feature extractor has different forms, and all of these variations are commonly utilized in a variety of image processing applications [25, 26]. Typically, the LBP features obtained from the spectrogram have different applications. One application is related to audio signal classification [26]. The LBP features’ drawback is their sensitivity to noise [27]. Different factors affect the reliability of the LBP features. These issues are mostly due to a lack of components in the LBP histogram. Processing patch–wise LBP is one of the problems that result in fewer components in each patch's histogram. In addition, the occurring frequencies patterns are vastly different, so these patterns cannot be precisely evaluated. The LBP feature extraction with an adaptive threshold is proposed to deal with such problems. The proposed method is used to classify and identify four flow patterns from sound signals. Furthermore, the new approach is compared with three works in [18, 19, 26] using the RWCP database.

The rest of the paper is organized as follows:

Section 2 briefly explains the experiment's setup for recording the flow's sound in the pipeline. The evaluation of the proposed method is discussed in Sects. 3. Section 4 presents the simulation results. Finally, in Sect. 5, the conclusion is represented.

2 Experiment's Setup

Two hydrophones are used in the setup, and both of them are connected by an amplifier to a signal analysis workstation. The position of the hydrophones is shown in Fig. 2. The system's input is the signals with a frequency range of 0.1–10 kHz. The diameter of the pipe's upward side and downward side is 5 inches and 3 inches, respectively. Figure 3 illustrates the schematic of the pipe with the position of two hydrophones.

Fig. 2
figure 2

Position of each hydrophone in the pipe

Fig. 3
figure 3

Flow loop and the position of the two hydrophones (X marks the position of hydrophones)

3 Proposed LBP with Gaussian Reweighting Thresholding

3.1 Texture Analysis

In most applications, time–frequency analyses are used to create the spectrogram of the signals. The Gammatone filters and short-time Fourier transform are two instances of the existing approaches [18, 19]. The main difference between these two specific methods is in the spacing of the frequency positions on the axis. In gammatone-like spectrograms, frequencies have equal distances from each other and are scattered equally on the equivalent rectangular bandwidth (ERB). These properties make the gammatone approach a closer approximation of the human sensory mechanism compared to the linear spaces of the frequency points on the STFT spectrogram. Time–frequency representations are suitable options to provide rich information about the texture. Of the main spectrograms, the Gammatone-like spectrogram (LGS) logarithm visually contains more texture information, making the LGS a suitable candidate.

For conducting the analysis, a collection of 50 Gammatone filters for the spectrogram are used. Frequency points on the ERB scale spaced in the interval of 100 and 10,000 Hz [18, 19]. The spectrogram of Gammatone is shown by S (f, t), in which f represents the frequency, and t shows the time. For the improvement of the low-power instances, the following formula is used. This helps to gain more details, as illustrated in [26]:

$$G\left( {f,t} \right) = \log \left( {S\left( {f,t} \right)} \right)$$
(1)

The logarithm of the Gammatone-like spectrogram could be obtained by the following formula [26]:

$$I\left( {f,t} \right) = \frac{{G\left( {f,t} \right) - \mathop {\min }\limits_{f,t} G\left( {f,t} \right)}}{{\mathop {\max }\limits_{f,t} G\left( {f,t} \right) - \mathop {\min }\limits_{f,t} G\left( {f,t} \right)}}$$
(2)

3.2 Traditional LBP Features

Many feature extraction approaches have been proposed to analyze the image. Some of these approaches are SIFT [28], HOG [29] and LBP [30]. LBP is preferable among these feature extraction algorithms due to its easy implementation and effective results. It is crucial to consider the spectrogram as a synthesized image and acknowledge the many differences that exist between a spectrogram and a standard image. The following are some of the visual distinctions between a spectrogram and a standard image: [26]:

Smoothness: the first difference is that a standard image has a smooth dispersion, and the intensities change smoothly; on the other hand, in the spectrogram, the neighboring pixels could be remarkably different. The reason is that these pixels represent the power distribution of a sound signal, and they could be very different throughout the time–frequency instances.

Translation, scaling and rotation: The location and scales could vary for different shots in standard images. However, translation may only appear along the time axis in the spectrum. Hence, feature extraction methods such as SIFT, which extracts scale-invariant features, are not useful for spectrograms.

Micro-structure: Edges, spots and corners are usually considered essential details in natural images, but these micro-structures may not appear in the spectrogram. Therefore, edge-based feature extraction approaches such as HOG may not be appropriate for frequency analysis.

This paper considers LBP features the best way to classify flow patterns due to the following advantages. (1) The LBP feature works with signs of intensities which makes it resistant to spectrogram image monotonic changes. (2) In addition to edges and corners, LBP can extract other micro-structures which are not available to HOG. These advantages make the LBP a fit choice for the spectrogram feature extraction.

Conventional LBP algorithms use a fixed zero threshold [30]. The LBP code first runs on all the pixels. A pixel's intensity is denoted by \(i\). The \(i_p\) determines the intensity of the \(p_{th}\) neighboring pixel, where p = 1,2…P and P is the number of neighbors. Here, \(R\) defines the distance between \(i_p\) and \(i\). Considering mentioned notations, \(\text{LBP}_{p,R}\) encodes the pixel difference \(z_p = i_p - i\) between \(i_p\) and \(i\) with \(R\) as the distance. Encoding every LBP bit with the following:

$$b_p = \left\{ {\begin{array}{*{20}c} {1\ \text{if}\ D_p \ge 0} \\ {0\ \text{if}\ D_p < 0 } \\ \end{array} } \right.$$
(3)

The LBP code is formed as \(\overrightarrow {b_p b_{p - 1} \ldots b}\) from the mentioned LBP bits. After that, the \(\text{LBP}_{p,R}\), histogram of LBP codes with the dimension of \(2^p\) is used as features vectors. It is evident that using more neighbors leads to more information on the image, increasing the dimension.

3.3 Proposed Method

Illumination variation is an important factor that affects the pixel intensity changes of the standard images. Thus, the pixel difference \(z_p\) also has a significant variation. It can be concluded from Eq. (3) that only the sign of the pixel variation is used by the LBP, and the amplitude is being dismissed. The amplitude of the pixel difference contains meaningful data describing the spectrogram. Therefore, both the direction and the amplitude of the pixel variation are used in [26] when extracting the patterns from the spectrogram. In conventional LBP features, a fixed zero threshold is proposed in [27], making it sensitive to noise. The noise can affect the system by changing the pattern bits for slight differences. A multi-channel LBP (MCLBP) feature to obtain the sign and amplitude of the pixel differences quantizes the pixel differences using multiple thresholds is proposed in [26]. One way to deal with this issue is to set the threshold of the pixel differences at \(T_i\) instead of thresholding at 0 [26, 33 ].

$$b_p^i = \left\{ {\begin{array}{*{20}c} {1\ \text{if}\ D_p \ge T_i } \\ {0\ \text{if}\ D_p < T_i } \\ \end{array} } \right.$$
(4)

where \(i\) indicates the channel number and \(b_p^i\) is the LBP's \(P\)th bit in \(i\). For each channel, a fixed user-defined threshold is used in this method. This paper proposes a new adaptive threshold function. This function is based on the local and global spectrogram information. This method does not make a different image band, but the proposed threshold changes in each frame. The proposed threshold function is based on a cumulative density function (CDF) of Gaussian distribution function to have both fast and robust LBP features, as below:

$$f_i = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace}\!\lower0.7ex\hbox{$2$}}\left[ {1 + {\text{erf}}\left( {\frac{\sigma_i }{{\sigma \sqrt 2 \sqrt {\left| {\mu - \mu_i } \right|} }}} \right)} \right], i = 1, \ldots ,M$$
(5)

where \(M\) is the number of spectrogram frames, \(\sigma\) and \(\mu\) are standard deviation and global mean, and \(\mu_i\) and \(\sigma_i\) are the mean and standard deviation in a frame of the image. The error function is described by \({\text{erf}}\left( x \right)\) as bellow:

$${\text{erf}}\left( x \right) = \frac{2}{\sqrt \pi }\mathop \smallint \limits_0^x e^{ - t^2 } dt$$
(6)

The proposed function is plotted in Fig. 4 for three different values of \(\mu\) and \(\sigma\).

Fig. 4
figure 4

The cumulative density function of Gaussian distribution with three different mean and standard deviation (s.d)

Finally, each encoding LBP is obtained as below in adaptive threshold LBP (AT-LBP):

$$b_p^ = \left\{ {\begin{array}{*{20}c} {1\ \text{if}\ D_p \ge f_i } \\ {0\ \text{if}\ D_p < f_i } \\ \end{array} } \right.$$
(7)

Noise properties have a determining role in the formation of the threshold function. When image noise increases, the threshold value will be increased and vice versa, leading to more robust LBP features.

4 Simulation Results

This section compares the flow pattern identification through spectrogram of recorded sound signals with the proposed method and conventional LBP. The proposed method is investigated for four flow patterns; Stratified, churn, annular and slug. In addition to that classification on the RWCP database is conducted for further validation of the proposed method and comparing the results to three other methods: MFCC-HMM [19], Spectrogram Image Feature (SIF) [18] and MC-BDLBP [26]. The RWCP database [31] is an appropriate benchmark in validating the performance of the classification algorithms in many research. The database consists of crash sounds of plastic, wood and other sounds like coins, bells, saw, etc. To study the effectiveness of the proposed method, the MFCC-SVM and SIF-SVM are applied by using the proposed AT-BLP features instead of MCC-HMM and SIF. The experiment is performed under four noise levels: 20, 10, 0 and −5 dB using the NOISEX92 database [32]. The applied linear SVM has a cost parameter set to 40 for classification. In the first experiment for the classification of pipeline flow, all the signals are sampled in 16 bits at 50 kHz using two hydrophones. The samples are collected from four classes of flow pattern types consisting of 836 samples in total. The number of samples for each class is shown in Table 1. In this experiment, 30% of samples are used for training and added noise is the same for training and test samples. As shown in Table 2, the proposed method has 98.7% correct classification for clean samples, whereas the traditional LBP represents 96%, and the new method shows much better performance when noises are added.

Table 1 Number of the samples for each flow pattern class
Table 2 Recognition rate of proposed method at different noise levels (%)

The RWCP contains 9722 sound events files with 105 distinct sets. The same partitioning of RWCP as in [18, 19, 26] is applied. A total of 50 audio signal stets are chosen from the RWCP database. The total 2500 and 1500 audio signals are considered for training and testing purposes. As in [26], two scenarios for adding noise are employed. In the first scenario, the SNRs are uniformly sampled from [−5 25] with the same type of noise in both training and test samples. The recognition rates are represented in Table 3. As shown in Table 3, the proposed method shows a better classification rate compared to the state of arts in [18, 19]. However, the method in [26] has better performance in the RWCP database at the cost of more complexity.

Table 3 Percentage of the Classification rates in comparison with different methods under different noise sources on the RWCP Database in the first scenario

Table 4 illustrates the classification rates for the second scenario where the testing noise type is different from the training. The different noise types in the training and test samples cause a reduction in the recognition rate in this case. In this case, the proposed approach shows a better classification rate in most noise types than the MFCC-HMM and SIF-SVM. As shown in Table 3. The results are as par with MC-BDLBP while maintaining lower complexity.

Table 4 Percentage of the Classification rates in comparison with different methods under different noise sources on the RWCP database in the second scenario

5 Conclusions

This paper proposes a new LBP feature based on a new cumulative density function of Gaussian distribution thresholding. The algorithm is an enhancement of the conventional LBP algorithm. A comparison is made between the new algorithm and two other feature extraction techniques (e.g., MFCC-HMM and Spectrogram Image Feature (SIF)) through Mont Carlo simulations. Simulation results have shown the better performance of the proposed method by providing a better classification rate and less complexity.