Keywords

1 Introduction

Centrifugal pump is one type of pump widely used in the industry such as chemical, oil, and gas company because of its simple mechanism and construction. Considering the importance of its role, the performance is salient to be maintained. The reduction of its performance is usually affected by faulty components. One of the most common causes of components’ failure is cavitation. Cavitation occurs due to the evaporation of liquid flowing fluid which is caused by a pressure drop below the saturated vapour pressure.

The cavitation diagnosis is very important to be carried out at an early stage because a decrease in pump capacity due to cavitation can disrupt production activities. This will certainly impact the level of productivity in an industry so that an effective method is needed in detecting early cavitation in centrifugal pumps.

Early cavitation detection on the centrifugal pump was examined and tested by several researchers in various methods. ‬Al-Hashmi et al. [1] utilized spectrum analysis from vibration signals. On the other hand, Al-Obaidi [2] examined the use of statistical features in the time domain to detect faults in centrifugal pumps. Another approach was introduced by Nasiri et al. [3] which uses Artificial Neural Network (ANN) based vibration signal for detecting cavitation in a centrifugal pump. Sakthivel et al. [4] applied the Decision Tree algorithm in detecting cavitation in a centrifugal pump. Farokhzad [5] detected several faults of the pump using the Adaptive Network Fuzzy Inference System (ANFIS); however, the indication of cavitation at the initial stage was difficult to observe.‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬‬

The researchers then try to use one of the pattern recognition-based methods, called support vector machine (SVM). This method is considered as the method which enables to classify data and provide information with a high level of accuracy. Ebrahimi and Javidan [6] investigated faults in centrifugal pumps using vibration signal analysis and the SVM method. Vibration signals were decomposed in three-level using wavelet transform, and descriptive statistical features were extracted from detail and approximation coefficients of the wavelet. Saberi et al. [7] found that the SVM method had the advantage of having a kernel function that could distinguish normal conditions and high noise on the centrifugal pump. Sakthivel et al. [8] compared the use of Proximal Support Vector Machine (PSVM), Gene Expression Programming (GEP), Wavelet-GEP, and SVM, which showed that SVM was the method with the highest accuracy.

The use of SVM in detecting early cavitation requires the help of a kernel function. The selection of the right kernel function affects the detection results. The recommended kernel function is Gaussian Radial Basis Function (RBF) because it can classify non-linear data groups [9]. In addition, the statistical features in the time domain used can also affect SVM performance. Rapur and Tiwari [10] proposed that the statistical parameters mean, standard deviation, and entropy because they give excellent accuracy. On the other hand, Elangovan et al. [11] proved that the promising statistical parameter is the standard error and minimum value.

However, there is no standard procedure in determining the use of statistical parameters, and the best classification algorithm for detecting cavitation in a centrifugal pump is the SVM. In this study statistical parameters are extracted from the time domain instead of the frequency domain since the spectrum from the vibration of fluid interaction in centrifugal pumps is dominated by noise and random high-frequency vibration which has less meaningful information. Further research and development of the SVM-based method are needed to get better accuracy. Hence this paper aims to find the best combination of the statistical features, and optimization algorithm in SVM to detect early cavitation and to classify several levels of cavitation in the centrifugal pump.

2 Support Vector Machine

SVM is one method that classifies data based on pattern recognition. Pattern recognition works with separating data into several groups or classes [12]. This method can be classified as part of an artificial intelligence system built for decision-making. Input data is used in various, such as numbers, images, sounds, or a signal wave. This method is very popular in the field of statistics. Up to now, the implementation of the analysis vibration signal method based on pattern recognition continues to develop. This affects the increasing number of new methods based on pattern recognition. The appearance of this new method indicates that the level of popularity in the future will be even better.

SVM is a method that is used for binary classification. Originally, SVM combined several sets of concepts in the field of pattern recognition. This method works by finding the best hyperplane that separates groups of data on a dimension perfectly into two classes. The pattern of the two classes assumed has been completely separated by a hyperplane in a dimension defined by Eq. (1).

$$\overrightarrow{\rm{w}} \cdot \overrightarrow{\rm{x}}\rm{ + b = 0}$$
(1)

If \(\overrightarrow{{\rm{x}}_{\rm{i}}}\) is in class −1, then as shown in Eq. (2).

$${\vec{\rm{w}}} \cdot \vec{x} + b \le - 1$$
(2)

While the value \(\overrightarrow{{x}_{i}}\) in class + 1 is shown in Eq. (3).

$${\vec{\rm{w}}} \cdot \vec{x}_i + \ge + 1$$
(3)

Quadratic Programming (QP) problem is an effort to maximize the value of the distance between the hyperplane and its closest point by finding its minimum point like Eq. (4).

$$\mathop {{\rm{min}}}\limits_{{\vec{\rm{w}}}} {\tau }\left( {\rm{w}} \right){ = }\frac{{1}}{{2}}{\rm{ w}}^{2}$$
(4)
$${\rm{y}}_{\rm{i}} { }\left( {{\vec{\rm{x}}}_{\rm{i}} \cdot {\rm{w + b}}} \right) - {1 } \ge { 0, }\forall_{\rm{i}}$$
(5)

The problem in Eqs. (4) and (5) can be solved by the Lagrange Multiplier (αi) technique, as in the following Eq. (6).

$$L \left( {\vec{w},b,\alpha } \right) = \frac{1}{2}\vec{w}^2 - \mathop \sum \limits_{i = 1}^l \alpha_i \left( {y_i \left( {\left( {\overrightarrow {x_i } . \vec{w} + b} \right) - 1} \right)} \right); \left( {i = 1,2, \ldots ,l} \right)$$
(6)

Furthermore, Eq. (4) is optimized by maximizing problem that only contained value (αi), as in the following Eq. (7).

$$\mathop \sum \limits_{\rm{i = 1}}^{\rm{l}} {\upalpha }_{\rm{i}} - \frac{{1}}{{2}}{ }\mathop \sum \limits_{\rm{i,j = 1}}^{\rm{l}} {\upalpha }_{\rm{i}} {\upalpha }_{\rm{j}} {\rm{y}}_{\rm{i}} {\rm{y}}_{\rm{j}} {\rm{x}}_{\rm{i}} {\rm{x}}_{\rm{j}}$$
(7)

So, Eq. (8) is obtained:

$${\upalpha }_{\rm{i}} { } \ge { 0}\left( {{{i = 1, \, 2, \ldots }}{\rm{.,l}}} \right){ }\mathop \sum \limits_{\rm{i = 1}}^{\rm{l}} {\upalpha }_{\rm{i}} {\rm{y}}_{\rm{i}} { = 0}$$
(8)

Equation (8) shows that the value (αi) is mostly positive, and the support vector correlates with this positive value (αi).

Figure 1 shows several patterns of two classes (the pattern in class A is marked with a red box symbol and the pattern in class B is marked with a blue circle symbol) which will further be processed to search for the best hyperplane by giving rise to several discrimination boundaries (alternative split lines). In determining the best hyperplane, it can be done by finding the maximum point and measuring the margin. Figure 2 shows the best hyperplane located in the middle between the two classes, and the pattern near the hyperplane is Support Vector.

Fig. 1.
figure 1

Search for optimal hyperplane

Fig. 2.
figure 2

Optimal separating hyperplane

3 Method

The data acquisition process was taken by recording 500 files of data from vibration signal for every four variations of pump conditions, including normal (0% valve blockage), level 1 cavitation (0.25% valve blockage), level 2 cavitation (0.50% valve blockage), and level 3 cavitation (0.75% valve blockage). The recording time was 10 s per file and paused for 2 s between recordings. The pump rotational speed was regulated at 2850 RPM, and a sampling rate of 17,066 Hz was set to produce a stable change in each condition. The recording was carried out using the NI 9234 acquisition device with the chassis NI DAQ 9174, and an accelerometer mounted on the pump inlet. The data acquisition process was regulated using the NI MAX and Matlab R2017a. The test rig setting consists of main components such as a centrifugal pump, closed-loop pipe network, gauges valves, water tank, and flowmeter.

3.1 Feature Extraction and Selection Process

The vibration signal was extracted into ten statistical features in the time domain as shown in Table 1. Each statistical feature was then plotted to show its characteristics to the distribution of data from the vibration signal of all conditions. All results of the extraction process were then selected. The feature selection process was done by using Relief Feature Selection. This method ranked the statistical features based on the weight of the information content. The feature selection process then produced the best data input for SVM classification.

Table 1. Statistical features in time domain

3.2 Classification Process

The classification stage was carried out using the SVM-based method. To determine the various levels of cavitation and the initial formation, two methods were applied, namely Binary and Multi-Class SVM. Kernel Function for mapping process used Radial Basis Function (RBF). The multi-class SVM classification method was carried out with three trials with an optimization method. The optimization method used at this stage was Grid Search Method (GSM) and Bayesian Optimization (BO) techniques.

GSM algorithm is an optimization technique based on grid search. Grid search is performed on each mapping function. Each mapping function with optimal results and not optimal will be evaluated by the GSM algorithm so that the number of evaluations produced in several mapping processes that occur. The results of optimization then are sorted according to functions that have the best classification parameters.

BO techniques were carried out to evaluate any mapping errors found. This technique was equipped with an acquisition function that was useful for determining which mapping function was not optimal. Then the mapping function was optimized and would be returned in the training process. The number of mapping functions evaluated according to the number of mapping error processes that occurred.

4 Result

Statistical features in time domain aim to find characteristics of data from vibration signals under normal conditions, level 1, 2, and 3 cavitation. Some features such as RMS, SD, variance, entropy, and SE show the separation of the pump conditions. However, for normal conditions and level 1 cavitation, they fail to separate the classes. Thus, the selection process is intended to select the best features as input for SVM classification.

The result of features selection using Relief Feature Selection can be seen in Fig. 3. The result of the selection shows that the variance has the highest weight meanwhile the crest factor is the lowest one. Three features with the highest scores were used as SVM input since they contained a weight of more than 0.0050. However, the SE is not used as input because it has the same value as SD.

Fig. 3.
figure 3

Features selection result

The binary SVM classification was performed on four pump conditions. The two main stages carried out in this classification are the training and testing process. The training forms a classification model and generates a mapping function, while testing data evaluates the training model and determines the accuracy. The input data was separated using the cross-validation process to avoid overfitting. This process includes 800 samples for training and 200 samples for the testing process.

The steps taken in the multi-class SVM correspond to the binary SVM method. The method classifies the four pump conditions and shows the result of the spread of the pattern. The cross-validation process produces 1800 training data and 200 testing data, where all variations have the same amount of data in each set.

The classification was done by three experiments, those were multi-class SVM without optimization, using GSM optimization, and BO algorithm. From the experiments, the highest level of accuracy is shown by a combination of multi-class SVM and BO.

Based on the distribution of statistical parameters, RMS, SD, variance, entropy, and SE solely can separate four pump conditions properly with an exception for normal and level 1 cavitation classes which show less clustering effect. However, the results have a promising potential to be used as the input parameters for SVM.

Table 2 shows the success of SVM binary classification in detecting early cavitation phenomena, with an accuracy of 99%. The separation of the four classes is carried out using a combination of multi-class SVM and BO as depicted in Table 3 with a maximum accuracy of 100%. The obtained classification accuracy is a bit higher compared to that from [5] where they extracted statistical parameters from wavelet coefficients. The relief feature selection procedure is proven to be more effective to choose suitable statistical parameters than the wavelet-based parameters extraction.

Table 2. Binary SVM accuracy
Table 3. Multi-Class SVM accuracy

The results of the study reveal that the SVM-based cavitation detection method show the early cavitation phenomenon. The combination of the SVM-BO algorithm is the most superior method to detect cavitation in several levels.

5 Conclusion

The study indicates that the character of each statistical feature in the time domain produces specific information to the vibration signal distribution. RMS, SD, and variance are the most suitable statistical features used as input for SVM classification. The SVM has been proven to detect early cavitation phenomena in the centrifugal pump. This is shown in the classification between normal and level 1 cavitation that has an accuracy of 99%. Development and optimization of SVM multi-class algorithms by using BO is the best combination method in detecting cavitation at several levels. The level of accuracy obtained with this combination is 100%.