1 Introduction

The hydraulic systems are one of the most critical systems in the industries because of its better capabilities. The use of hydraulic systems varies from weight lifting using a single-stage hydraulic cylinder to the applications in heavy earth moving equipment [1]. A simple hydraulic system consists of the actuators and the hydraulic fluid flow circuit, whereas a complex system may consist of various working circuits like actuating circuit, cooling circuit, etc. Proper functioning of the cooling circuit is as desired as the functioning of the actuating circuit. Very few research progress has been made in the field of condition monitoring of the cooling circuit of the hydraulic system. Till date, attempts have been made to identify the leakage detection from hydraulic cylinders [2]. Besides, a neural network has been developed for the fault diagnosis of complex fluid power systems and pumping machinery [3, 4]. Observations of various literature suggest the implementation of different machine learning models for the fault categorization and diagnosis [5]. The process of gathering the information contained by the signals acquired in data-driven technique is called the feature extraction. To extract the useful information from the captured signal, it is a crucial step in the condition monitoring of mechanical component or machinery. Feature ranking techniques are espoused in order to indicate the importance of data. Information gain, Fisher score, Wilcoxon ranking, etc., have been used for feature selection for the condition monitoring of rotary bearing [6, 7]. These extracted features are fed to the system so as to monitor the working behaviour of mechanical systems. In a recent study, a deep neural network model has been adopted to inspect the condition of the cooling circuit of a hydraulic system [8]. Model-based fault recognition technique and data-based fault detection technique are two of the most widely used fault detection techniques in mechanical machinery. Because of the complexity of the mathematical modelling, model-based fault detection cannot be frequently used, thus, later one is more preferred.

In this work, the support vector machine (SVM) has been employed to supervise the health condition of the hydraulic system. The cooling circuit of the hydraulic system has been kept under investigation using the pressure signals. Four different statistical features have been extracted from the raw pressure signal acquired. All the features have been fed to the classifier as input for the behaviour classification. The correlation and their dependency plot have been used to visualize the data features and relationship among them. To achieve better accuracy, two different methods, namely tenfold cross-validation and grid search cross-validation methods have been employed. The results show better accuracy is achieved in the grid search cross-validation method.

2 Methodology

2.1 Datasets

The predictive maintenance of the mechanical component or machinery can be performed using a historical dataset collected over time. Helwig et al. developed a test rig for the automated condition monitoring of the complex hydraulic system [9]. The whole circuit consists of various components. The data gathered for the pressure variation in the cooling circuit of the developed test rig has been employed in this study. The schematic view of the considered hydraulic system is shown in Fig. 1. 2205 instances of pressure variation data are gathered for 60 s at the sampling frequency of 100 Hz.

Fig. 1
figure 1

Block diagram showing the hydraulic and cooling circuit

The flowing oil is entered in the cooling circuit after passing through a contamination filter and exits to the sink flowing out through a relief valve. The signals obtained from the pressure sensor belong to three dissimilar classes as mentioned by Prakash and Kankar [8]. Figure 2 shows the captured pressure signal with three diverse working conditions.

Fig. 2
figure 2

Sampled signature of pressure variation [8]

2.2 Feature Extraction

The very initial process to abstract out the useful information from the raw signals involves feature extraction from the signal. As such, there is no mathematical formulation to determine the optimal number of features. Thus, the selection of features to be extracted is of much significance. The features carry useful data about the raw data, but not all features are relevant. In [8], four optimal features have fed into the deep neural network and very good accuracy has been reported. So, in this study, four relevant features explained in brief below have been extracted.

  1. i.

    Root Mean Square (RMS): It is the square root of the arithmetic mean value of the squares of the signal readings. It is also known as the quadratic mean.

    $${\text{RMS}} = \sqrt {\frac{1}{n}\left( {x_{1}^{2} + x_{2}^{2} + \cdots + x_{n}^{2} } \right)}$$
    (1)
  2. ii.

    Standard Deviation: It is static that measures the distribution of dataset with respect to its mean. It is the square root of the variance.

    $${\text{Standard}}\;{\text{Deviation}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \overline{x}} \right)^{2} }}{n - 1}}$$
    (2)
  3. iii.

    Kurtosis: Kurtosis is derived from the Greek word “Kurtos” and it infers how heavily the tails of distribution differ from the tails of a normal distribution.

    $${\text{Kurtosis}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \frac{{\left( {x_{i} - \overline{x}} \right)}}{n}}}{{\sigma^{4} }}$$
    (3)
  4. iv.

    Skewness: Skewness is the distortion or asymmetry in a symmetrical bell-shaped curve, mathematically known as the normal distribution. If the curve is shifted towards either end, then it is said to be skewed.

    $${\text{Skewness}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \overline{x}} \right)^{3} }}{n}$$
    (4)

These four extracted features constitute the feature matrix. The sample feature values used to train the SVM have been mentioned in Table 1. The feature matrix is segregated into the ratio of 4:1 for the training and the testing purpose. Once the training of the classifier is finished, the testing data is fed to it. k-fold cross-validation method is employed to eliminate the biaseness  accompanied by the grid search method. The grid search method has been explained in Sect. 4.

Table 1 Sample input feature values for SVM

In this work, tenfold cross-validation (for k = 10) is used to visualize how the model performs when a new dataset is fed to it, whereas grid search is a way to choose the best among the family of models, parameterized using a grid of parameters.

2.3 Correlation Coefficient

The correlation matrix is a table that shows the correlation coefficient between two variables. It is used to summarize a large amount of data to visualize the patterns and their mutual dependency. In the proposed manuscript, the correlation factor (r) has been calculated for the four different extracted features with respect to each other and themselves also. It is obvious that the data depends on itself fully, so ‘r’ for itself is equal to unity and represented in diagonal elements. For two given variables, suppose x and y, the correlation coefficient can be calculated using Eq. 1. The correlation matrix of the extracted features is shown in Fig. 3. Higher values in the block show that these particular two properties are highly correlated.

Fig. 3
figure 3

Correlation matrix for extracted features

$$r = \frac{{n\left( {\sum xy} \right) - \left( {\sum x} \right)\left( {\sum y} \right)}}{{\sqrt {\left[ {n\sum x^{2} - \left( {\sum x} \right)^{2} } \right]\left[ {n\sum y^{2} - \left( {\sum y} \right)^{2} } \right]} }}$$
(5)

2.4 Dependency Plot

Figure 4 shows the plots among the four different features extracted from pressure signals. It can be treated as the data visualization technique. It shows the scattering of feature matrix data with respect to itself as well as with respect to each other feature. The histogram is obtained when distribution of feature vector is obtained with respect to itself. The histograms as shown in Fig. 4 can be generally classified into two classes, i.e. unimodal and bimodal histogram. Unimodal histogram infers the spread of data about a single value, whereas the bimodal histogram concludes the spread of data about two peak values. Most of the histograms (except Kurtosis) can be inferred as a bimodal histogram. Rest other corresponding plots show the distribution of the data with respect to another one. The trend observed in dependency plot also well satisfies the interpretations drawn from the correlation matrix in Sect. 2.3.

Fig. 4
figure 4

Dependency plot for four extracted features

3 Support Vector Machine (SVM)

SVM is categorized as a supervised machine learning process proved efficient for the classification or regression problems. It exercises an extremely complex data transformation to define an optimal hyperplane between the possible categories of outputs using various kernels. SVM attempts to define a boundary between the two different categories in such a way that the marginal distance between two different classes is maximum, so that the generalization error is minimum. The adjacent data points used to describe the margin are known as support vectors. For example, in a given N number of sample set \(\{({x}_{i},{y}_{i})\}; i = 1\) to N. It is required to determine a separation plane among different possible planes that distinct input samples into their classes by least generalization error. For two different classes of data, the data can be labelled as \({y}_{i} = 1\) and \({y}_{i}= +1\) associated with the respective class. Slack variables (\(\xi )\) are considered non-negative for a non-separable type of data. The hyperplane defined by \(f\left(x\right)=0\) separating the given data can be attained as a solution using mentioned optimization problem [5].

Minimize \(\frac{1}{2}\left\| w \right\|^{2} + c\sum\nolimits_{{i = 1}}^{N} {\xi _{i} }\).

subjected to \(\left\{ {\begin{array}{*{20}c} {y_{i} \left( {w^{T} x_{i} + b} \right) \ge 1 - \xi_{i} } \\ {\xi_{i} \ge 0, i = 1,2,3, \ldots N} \\ \end{array} } \right.\) where c is the error penalty constant.

Figure 5 shows the two classes of data separated by a hyperplane.

Fig. 5
figure 5

Representation of hyperplane classifying two-class problem by SVM

4 Grid Search Method

The grid search method is an approach or process of scanning the data to configure optimal parameters for an artificially intelligent model. It tunes the methodically build model for various parameters and evaluate it with each combination of algorithm parameters stated in a grid. In this study, four different values of error penalty constant (c) in combination with two different kernels have been employed. Nine discrete values of ‘gamma’ have been also used in the grids. Table 2 mentions the combination of different SVM hyperparameters used in the grid search method.

Table 2 Combination of SVM hyperparameters for grid search

The higher value of the penalty parameter (C) tries to minimize the misclassification of the training data, whereas the lower value of penalty parameter attempts to maintain a smooth classification. At the same time, if the value of the penalty parameter is set to be very high to correctly classify all the data, there is always a higher chance of overfitting in the model. So, in this study, the maximum value of C has been kept at 1000.

Gamma (γ) is the inverse of the standard deviation of the Gaussian function that controls the trade-off between error due to bias and variance in the SVM model. For large values of γ, support vector does not have much impact on the classification of data.

5 Results and Discussion

5.1 Confusion Matrix

A confusion matrix often called as an error matrix is generally used to designate the attainment of a classification model on a particular set of data for which the actual values are known. In the ongoing study, support vector classifier parameters are set as penalty parameter, C = 1.0, with ‘rbf’ kernel and the value of ϒ to be auto-deprecated. This classifier is used for the test data (20% of the total data, 441 in number) and the accuracy achieved is 95.91%. The corresponding confusion matrix is publicized in Fig. 6.

Fig. 6
figure 6

Confusion matrix for support vector classifier

Further, to enhance the accuracy of the SVM classifier, tuning of the hyperparameters is required. In this work, tenfold cross-validation followed by grid search method  has been performed. The cross-validation employing grid search finds out the best parameters for the better accuracy of the classifier by applying the various parameters to the classifier in one grid. The best parameters for this data in the support vector machine as determined by using grid search method are mentioned in Table 3. The combination of these parameters yields the accuracy of 98.80%. The corresponding confusion matrix is shown in Fig. 7. The flowchart of the methodology adapted is shown in Fig. 8.

Table 3 Combination of the best hyperparameters according to grid search method
Fig. 7
figure 7

Confusion matrix with the best parameters

Fig. 8
figure 8

Flowchart of cooling circuit condition monitoring

6 Conclusions

In this study, the following conclusions and observations can be drawn:

  • Working behaviour of the hydraulic cooling circuit has been determined using pressure signature employing support vector machine (SVM).

  • The correlation matrix and the dependency plots have been shown in a good argument with each other.

  • The grid search method can successfully be employed to determine the best hyperparameters combination.

  • The accuracy has been boosted to 98.80% from 95.91% employing the best parameters as determined by grid search method.

  • The best accuracy of the developed model by Prakash and Kankar [8] using feature ranking technique is 99.54% which is not much less than that achieved in this study. Thus, it can also be concluded that feature ranking plays an important role in boosting the accuracy of the machine learning models.