1 Introduction

Hydraulic transmission systems play an important role in modern industry. Condition monitoring of hydraulic systems has attracted increasing attention in recent decades (Jegadeeshwaran and Sugumaran 2015; Sepasi and Sassaniv 2010; Fu et al. 2014). Axial piston pumps are commonly used in hydraulic systems. The piston pumps defects may cause the machine breakdowns, lead to severe economic loss or even catastrophic casualty. As a matter of fact, pump defects take up a large percentage of electromechanical equipments faults (Ferdowsi et al. 2014; Lan et al. 2018; Wang et al. 2018a, b). Consequently, precisely and effectively detect faults in pumps has become an urgent task to ensure the safety and reliability of hydraulic systems.

Faults of axial piston pump mostly occurred in cylinder block, swash plate, bearings, pistons, etc. Pump health condition monitoring can be undertaken using different sensors, e.g., accelerometers, flowmeters, electric current, etc. Among them, vibration signals collected by accelerometers have enjoyed great success in condition monitoring and fault diagnosis of rotating machinery (Wang and Liang 2012; Xiang et al. 2015; Samanta et al. 2006; Pandya et al. 2014; Wang et al. 2009; Qiao et al. 2017). Many of the existing machinery diagnosis methods have been referred and introduced to detect faults in hydraulic pumps, including wavelet packet analysis (Gao and Zhang 2006), intermittent chaos (Zhao et al. 2009), local mean decomposition (LMD) (Jiang et al. 2014), empirical mode decomposition (EMD) (Lu et al. 2016, 2015), and minimum entropy deconvolution (MED) (He et al. 2016; Du et al. 2017; Dong et al. 2017), etc. Gao and Zhang (2006) performed wavelet packet transformation and wavelet coefficient residual analysis to discharge pressure signals to detect faults in hydraulic piston pumps. Zhao et al. (2009) introduced intermittent chaos combined with sliding window symbol sequence statistic to detect early faults in hydraulic pumps. Jiang et al. (2014) proposed an improved adaptive multi-scale morphology analysis (IAMMA) in associated with LMD to detect faults of hydraulic pumps using vibration signals. Lu et al. (2016) developed a two-step fault diagnosis method based on EMD and fuzzy C-means clustering. EMD and Hilbert transform were employed to extract fault features from the discharge pressure signals collected by flowmeters. Fuzzy C-means clustering algorithm was applied to recognize hydraulic pump fault conditions. Meanwhile, Lu et al. (2015) also investigated fault severity recognition methods of hydraulic piston pumps, which would attain higher recognition accuracy. He et al. (2016) investigated multiple faults detection in rotating machinery by means of MED combined with SK analysis, which obtain a nice impulse restoration result of the vibration signals from a vacuum pump.

Generally speaking, once there are faults occurred to the components of the working piston pump, they are reflected in the measured signals with certain characteristics, which can be obtained through signal processing techniques. Most fault diagnosis methods are based on matching the extracted fault features with the corresponding analyzed fault characteristics. Unfortunately, many researches have reported that the frequencies of some different faults are identical (Gao and Zhang 2006; Lu et al. 2015). Besides, the fault mechanism of some faults is unclear, namely it is hard to point out such a fault with its corresponding characteristic. In such cases, the feature matching is failed. To tackle the challenge mentioned above, the main objective of the paper is to search for an automatic feature learning method for axial piston pump fault diagnosis.

Deep learning is new branch of machine learning, which has the great capacity in feature learning from raw data (Hinton and Salakhutdinov 2006; LeCun et al. 2015; LeCun and Bengio 1998). Although conventional shallow learning models, such as support vector machine (Suganyadevi et al. 2016; Liu et al. 2017) and artificial neural network (Trujillo et al. 2017; Huang et al. 2016), are usually employed to solve pattern recognition problems, they depend heavily on manually selected features by signal processing techniques. Compared with the shallow learning model, the deep learning model can realize very complicated transformation and abstraction of the raw data with its multilayer structures (LeCun et al. 2015). That is to say, deep learning models can decrease the dependence on various signal processing techniques. CNN (LeCun and Bengio 1998) is a kind of the deep learning model with powerful feature learning ability, which enjoyed great success in a variety of fields (Vu et al. 2018; Rafique et al. 2018; Liu et al. 2018). Nowadays, many researchers have exploited the excellent performance of CNN in mechanical fault diagnosis. Ince et al. (2016) constructed a motor condition monitoring and early fault detection system by employing one-dimensional CNNs. Appana et al. (2018) proposed an CNN-based feature learning and fault diagnosis method for the condition monitoring of bearings under varying rotational speeds. Janssens et al. (2016) introduced the CNN to autonomously learn useful features for bearing fault detection from the raw data. Wang et al. (2018a, b) proposed a CNN-based hidden Markov models for rolling element bearing fault identification. Ding and He (2017) presented an energy-fluctuated multi-scale feature mining approach based on wavelet packet energy and convolutional networks to classify faults in spindle bearings.

In the light of the feature learning ability of CNN models and great impulse restoration capacity of MED technique, a MED-enhanced CNN model is proposed in terms of detecting faults in axial piston pumps, and the contributions of this job is summarized as follows:

  1. (1)

    To solve automatic feature learning problem in detecting faults of axial piston pumps, the CNN model is introduced, which could effectively avoid feature matching difficulty due to the unclear fault mechanism.

  2. (2)

    To prompt the feature learning stability and classification accuracy, the MED technique is used to preprocess the raw vibration data. In fact, the preprocessing step here is something like a data cleaning method (Xu et al. 2019).

  3. (3)

    The feature learning process is visualized using the t-distributed stochastic neighbor embedding (t-SNE) technique (van der Maaten and Hinton 2008), and the stability of the MED-enhanced CNN is demonstrated by the learning trials.

The remainder of this paper is organized as follows. Section 2 gives a brief review of the basic theory of MED and CNN. In Sect. 3, the fault detection method is introduced in details. A benchmark study is investigated in Sect. 4. In Sect. 5, the proposed method is applied to analyze the vibration signals from an axial piston pump. Conclusion remarks are drawn in Sect. 6.

2 A brief review of MED and CNN

2.1 Minimum entropy deconvolution

The minimum entropy deconvolution technique aims at modeling an inverse filtering process against the commonly signal transfer process (Sawalhi et al. 2007; Endo and Randall 2007). The main target is to find an optimum inverse filter coefficient vector f to recover the fault impact by

$$ \varvec{y} = \varvec{f}*\varvec{x} . $$
(1)

As shown in Fig. 1, without any prior knowledge about the input x, the MED filter could adaptively adjust the filter coefficients by optimizing the objective function of the output y.

Fig. 1
figure 1

The main process of the MED filtering

High-order statistics are well known for depicting the shape of the probability density function (PDF). For example, a high value of kurtosis (fourth-order statistics) is usually related to spikes in the PDF. Therefore, kurtosis is often employed as an objective function to quantify the characteristics of a signal, which is given as

$$ K\left( {y\left( n \right)} \right) = \frac{{\tfrac{1}{N}\sum\nolimits_{n = 1}^{N} {\left| {y\left( n \right)} \right|^{4} } }}{{\left[ {\tfrac{1}{N}\sum\nolimits_{n = 1}^{N} {\left| {y\left( n \right)} \right|^{2} } } \right]^{2} }} $$
(2)

where y(n) \( (n = 1,2, \ldots ,N) \) is the output sequence after data sampling. As mentioned by Endo and Randall (2007), kurtosis is an effective indicator that reflects the “peakiness” of a signal, therefore the property of impulses.

The main process of the MED filtering is summarized as follows:

  • Step 1 Construct the relationship between y and f. The relationship can be generalized as

    $$ y\left( n \right) = \sum\limits_{l = 1}^{L} {f\left( l \right)\;x\left( {n - l + 1} \right)} $$
    (3)

    where x(n) \( (n = 1,2, \ldots ,N) \) is the input sequence after data sampling, f(l) \( (l = 1,2, \ldots ,L) \) is lth element in f, and finally get

    $$ \frac{\partial y\left( n \right)}{\partial f\left( l \right)} = x\left( {n - l + 1} \right) . $$
    (4)
  • Step 2 Maximize the objective function with respect to f(l)

    $$ \frac{{\partial \left( {O_{2}^{4} \left( {y\left( n \right)} \right)} \right)}}{{\partial \left( {f\left( l \right)} \right)}} = 0 $$
    (5)
  • Step 3 Update f by adjusting f(l) iteratively through

    $$ \sum\limits_{n}^{N + L - 1} {\left\{ {\left[ {\sum\limits_{k = 1}^{L} {f\left( k \right)\;x\left( {n - k + 1} \right)} } \right]x\left( {n - l + 1} \right)} \right\}} = \frac{{\sum\nolimits_{n}^{N + L - 1} {\left( {y\left( n \right)} \right)^{2} } }}{{\sum\nolimits_{n}^{N + L - 1} {\left( {y\left( n \right)} \right)^{4} } }}\left[ {\sum\limits_{n}^{N + L - 1} {\left( {y\left( n \right)} \right)^{3} } x\left( {n - l + 1} \right)} \right] $$
    (6)
  • Step 4 Obtain the final output y to approximate fault impact by the updated f using Eq. (3).

2.2 Convolutional neural network

A CNN is one kind of deep learning model with distinctive operation, namely convolution, subsampling and fully connection. There are three main traits of a CNN, that’s local field, subsampling and weight sharing (LeCun and Bengio 1998).

Figure 2 demonstrates the main architecture of a CNN model with five hidden layers. The hidden layers include convolutional layer C1 (first hidden layer), subsampling layer S1 (second hidden layer), convolutional layer C2 (third hidden layer), subsampling layer S2 (fourth hidden layer) and fully connected layer FC (fifth hidden layer). In order to illustrate the transfer process concisely, we give the input layer “S0.”

Fig. 2
figure 2

The common architecture of a CNN model

Supposing an input map \( x^{{S_{0} }} \) to the CNN model, the kth (\( k = 1,2, \ldots ,K \), where K is the number of feature maps) feature map \( x_{k}^{{C_{1} }} \) in layer C1 can be represented by

$$ x_{k}^{{C_{1} }} = f\left( {x^{{S_{0} }} *w_{k}^{{C_{1} }} + b_{k}^{{C_{1} }} } \right) $$
(7)

where f(∙) is the output activation function, \( w_{k}^{{C_{1} }} \) denotes the kth kernel of C1, and \( b_{k}^{{C_{1} }} \) is the kth bias of C1. Every kernel is like a small window observing local fields of \( x^{{S_{0} }} \) with a certain stride, and thus is the first trait of a CNN.

The precise location of data features is no more important once they are obtained by the convolution operation. Hence, subsampling operation is directly followed to decrease the computation complexity. As shown in Fig. 2, the K feature maps in C1 is the input maps to layer S1, and the transformation in this layer can be expressed by

$$ x_{k}^{{S_{1} }} = f\left( {s_{k}^{{S_{1} }} {\text{down}}\left( {x_{k}^{{C_{1} }} } \right) + b_{k}^{{S_{1} }} } \right) $$
(8)

where \( x_{k}^{{S_{1} }} \) is the kth feature map in S1, \( s_{k}^{{S_{1} }} \) denotes the kth scale of S1, down(∙) represents the subsampling function and \( b_{k}^{{S_{1} }} \) is the kth bias of S1. This is the second trait of a CNN.

Different from the situation of one input map in Eq. (7), the kth feature map \( x_{k}^{{C_{2} }} \) in layer C2 is

$$ x_{k}^{{C_{2} }} = f\left( {\sum\limits_{{i \in M_{k} }} {x_{i}^{{S_{1} }} *w_{ik}^{{C_{2} }} + b_{k}^{{C_{2} }} } } \right) $$
(9)

where Mk represents a selection of feature maps in layer S1, i is the ith feature map of Mk, \( w_{ik}^{{C_{2} }} \) is the kernel of C2 and \( b_{k}^{{C_{2} }} \) is the kth bias of C2. Equation (9) indicates that each feature map \( x_{i}^{{S_{1} }} \) shares the same weights \( w_{ik}^{{C_{2} }} \), which illustrates the weight sharing trait of a CNN.

Similarly, the kth feature map \( x_{k}^{{S_{2} }} \) in layer S2 is

$$ x_{k}^{{S_{2} }} = f\left( {s_{k}^{{S_{2} }} {\text{down}}\left( {x_{k}^{{C_{2} }} } \right) + b_{k}^{{S_{2} }} } \right) . $$
(10)

The feature maps in S2 are fully connected in layer FC. Input maps are transformed hierarchically by feedforward pass. The weights and bias in each hidden layers are updated by backpropagation pass. We can finally get the learned features by the whole training process, which can be referred in (Bouvrie 2006).

3 Automatic fault diagnosis using the MED based CNN

When faults occur, the vibration signals are different from those of the normal working pumps, which exhibit certain characteristics, such as impulses. However, the fault excitation impulses are usually too weak to be observed under strong noise environments (Antoni 2016; Li et al. 2017; Li and Zhao 2017; Xu et al. 2018). Meanwhile, due to the disturbances of transfer path in mechanical systems, the collected signals of a specific faulty component are inevitably distorted. Therefore, the MED technique is introduced with its great impulse restoration capacity.

In light of the strong feature learning ability of CNNs, this paper proposed a MED based CNN to the multi-fault detection of axial piston pumps. The MED technique is firstly employed to enhance the fault impulses in the raw vibration signals. To address the fault feature matching failure problem, a deep CNN model is then applied to automatically learn fault features from the vibration signals.

The flowchart of the automatic fault diagnosis procedure is described in Fig. 3. The general procedure is summarized as follows:

Fig. 3
figure 3

The flowchart of the automatic fault diagnosis procedure

  • Step 1 First, predefine the common fault patterns from 1 to R, which is the prerequisite of the next labeling work for collected samples.

  • Step 2 The raw vibration data are collected from the axial piston pump under different fault patterns by data acquisition system.

  • Step 3 The collected raw signals are filtered using the MED technique to largely get rid of disturbances of transfer paths and environments.

  • Step 4 The raw signals after MED filtering are directly separated into training sample set and testing sample set.

  • Step 5 The CNN model is trained hierarchically by alternate convolution and subsampling operations using the training sample set.

  • Step 6 The feature learning process is manifested via the two convolutional stage with t-SNE visualization.

  • Step 7 The testing sample set is treated as unknown faults to obtain the eventually fault classification results. In real applications, unknown fault pattern r must be included in the known fault patterns category.

It should be noted that, it is the lack of faulty training samples that prevents the intelligent classification based methods from practical applications. Numerical simulation (Xiang and Zhong 2016) of mechanical systems might be a probable tool to establish fault samples for machines under all kinds of working conditions.

4 Simulation verification with benchmark data

In this section, simulation verification is conducted with the benchmark data from Case Western Reserve University (CWRU) bearing data center (http://csegroups.case.edu/bearingdatacenter/home). The experimental bearing test setup contains an electrical driving motor, torque transducer and encoder, and a dynamometer.

4.1 Data source description

Table 1 gives detailed bearing data specifications in the case. Twelve bearing operating states are considered, as given in Table 2. The time waveform of the vibration signals (the first 12,000 points) and the corresponding spectrums are displayed in Fig. 4.

Table 1 Bearing specifications
Table 2 Fault category and sample distribution in simulation verification
Fig. 4
figure 4

Time waveform and the corresponding spectrums a normal state, b 0.007 in./inner race fault, c 0.007 in./ball fault, d 0.007 in./outer race fault, e 0.014 in./inner race fault, f 0.014 in./ball fault, g 0.014 in./outer race fault, h 0.021 in./inner race fault, i 0.021 in./ball fault, j 0.021 in./outer race fault, k 0.028 in./inner race fault, l 0.028 in./ball fault

4.2 Results and analysis

The simulation verification is devoted to applying the CWRU data mentioned above to evaluate the feature learning performance of the proposed MED-based CNN model. Every bearing state contains 120,000 data points. There are 2400 (200 × 12) training samples and 1200 (100 × 12) testing samples, respectively. The size of the input map to CNN model is 20 × 20. The filter length of MED is chosen as 30, and the iteration number is 50. The main parameters of the CNN model are listed in Table 3.

Table 3 The main parameters of the CNN model

The classification results using the MED-based CNN (MED–CNN) and the traditional CNN are shown in Fig. 5a, b, respectively. Confusion matrix is employed as an effective tool to describe classification results. The classification accuracies are listed in the diagonal line of the matrix. The rest elements describe the probability that a certain fault pattern is identified as another pattern. For example, as shown in Fig. 5a, the classification accuracy ratio of pattern 1 is 98% and 2% of the pattern 1 is identified as pattern 11. The average classification accuracy ratio using MED-based CNN is 97.33%, while that of the traditional CNN only remains 34.50%.

Fig. 5
figure 5

Multi-class confusion matrices in simulation verification

To validate the feature learning ability of the CNN model in diagnosing mechanical faults, the t-distributed stochastic neighbor embedding (t-SNE) technique is introduced for feature visualization. The feature learning process is described by three-dimensional scatter plot (3D plot) through the three representative stages, i.e., the input stage, the C1-S1 learning stage and the C2-S2 learning stage. In real world, we can get a complete visual experience by rotating the 3D plot to observe the learned features.

Figure 6a1–a3 gives vivid description of the feature learning processes of the MED–CNN (D1, D2 and D3 denote the three dimensions). Figure 6a1 shows the features of the input data after the MED filtering. Pattern 8 and pattern 10 are well identified after the MED filtering process. However, the other ten patterns are still mixed together. Features of the twelve patterns are learned in progress through the second stage by C1-S1 operations. As shown in Fig. 6a2, fault patterns can already be identified using the learned features. Compared with the second stage, features are further learned with smaller intra-category distances and larger inter-category distances, which can be seen in Fig. 6a3.

Fig. 6
figure 6

The feature learning processes in simulation verification

The feature learning processes of the traditional CNN model are illustrated by the three stages. As shown in Fig. 6b1–b3, the features are learned constantly but still fail to describe for some individual pattern except for patterns 1, 2 and 4. From Fig. 5, we can conclude that the feature learning ability of the traditional CNN model is greatly enhanced by the advanced MED filtering.

Table 4 shows the comparison results among traditional stacked auto-encoder (SAE), MED-enhanced SAE (MED–SAE), traditional CNN and the present MED–CNN. We chose SAE here for comparison in that it is known as a feature mapping model which could also learn features automatically by encoder and decoder process. The architecture of the SAE here is 400-200-100-80-12, learning rate is 0.05, number of pre-training epochs is 10, and number of fine-tuning epochs is 500 (these are chosen by experiences). It can be seen that, the MED techniques indeed prompt the classification accuracy of SAE from 52.67% to 58.17%. However, it still cannot achieve a satisfactory result. Fortunately, the present MED-enhanced CNN shows superiority against others.

Table 4 Comparison results in simulation verification

To further evaluate the effectiveness and robustness of the MED–CNN to detect multi-faults in mechanical systems, thirty trails are conducted. We train the model using 6 batch size under 5 iteration number conditions. Figure 7 shows the average accuracy ratio in the 30 trails. The maximum iteration number is chosen as 500, 400, 300, 200 and 100, respectively. The batch size is given successively as 30, 25, 20, 15, 10 and 5. The detailed accuracies in the 30 trails are listed in Tables 5 and 6, respectively.

Fig. 7
figure 7

The average accuracy ratio of the 30 trails in simulation verification

Table 5 The average accuracy ratio of the 30 trails based on MED-CNN in simulation verification
Table 6 The average accuracy ratio of the 30 trails based on traditional CNN in simulation verification

As shown in Fig. 7, the MED–CNN gives a higher and steadier classification accuracy ratio compared with the traditional CNN. It can also be seen that, with the increase in iterations, the classification steadiness gets stronger. Moreover, in addition to trail 23, the batch size 30, 25, 15 and 10 give similar high classification accuracy while the batch size 5 gives lower classification accuracy instead. Deepening the network did not obviously improve the classification accuracy. MED–CNN with seven hidden layers (i.e., three convolutional layers, three subsampling layers, and one FC layer) are tested, and Table 7 lists the results.

Table 7 Results of MED-CNN with 7 hidden layers in simulation verification

5 Experimental investigation for axial piston pumps

5.1 Data source description

This section is devoted to the experimental investigations on the common faults diagnosis of axial piston pumps. Figure 8 shows the experimental platform. The tested axial piston pump (A in Fig. 8) is made by Ningbo Hilead Hydraulic Co., Ltd. (P. R. China), which is located at the end of the test rig. The data acquisition system includes a signal conditioner, a laptop with data acquisition software and multiple accelerometers. Some parameters of the tested axial piston pump are list in Table 8.

Fig. 8
figure 8

Experimental platform for the tested axial piston pump

Table 8 Some parameters of the tested pump

In the experimental investigation, one channel (accelerometer#3, marked with red circle in Fig. 8) vibration signals that sampled at 48 k Hz are applied. The pump running states and sample distributions are list in Table 9. Every kind operating state contains 160,000 data points. There are 1500 (300 × 5) training samples and 500 (100 × 5) testing samples, respectively.

Table 9 Fault category and sample distribution in experimental investigation

The four common faults (shown in Fig. 9, marked with red circle) were as follows:

Fig. 9
figure 9

The four common faults in the axial piston pump

  1. (a)

    Wear in three pistons, 0.03 mm wear amount in diameter to the tagged pistons.

  2. (b)

    Blocked support hole in static pressure slippers.

  3. (c)

    Wear in shaft shoulder, 0.03 mm wear amount in diameter.

  4. (d)

    Cylinder block with a pitting defect, 0.5 mm in width, 0.3 mm in depth.

The time waveform of the vibration signals (the first 0.5 s) and the corresponding spectrums are displayed in Fig. 10.

Fig. 10
figure 10

Time waveform and corresponding spectrums a normal state, b wear in three pistons, c blocked support hole in static pressure slippers, d wear in shaft shoulder, e Cylinder block with a pitting defect

5.2 Results and analysis

In order to realize automatic feature learning from the raw data, we adopt 400 data points as a sample. Therefore, the size of the input map is 20 × 20. In this case study, we adopt the same model parameters as listed in Table 3.

Figure 11 demonstrates the multi-class confusion matrices of the experimental data. Figure 11a, b gives the classification results using the MED–CNN and the traditional CNN. The average classification accuracy ratio using MED based CNN is 100%, and that of the traditional CNN remains 71.20%.

Fig. 11
figure 11

Multi-class confusion matrices in experimental investigation

The fault patterns are totally identified as itself using the MED–CNN model, which is demonstrated by the diagonal line of confusion matrix in Fig. 11a. It is not so well of using the traditional CNN model. As shown in Fig. 11b, pattern 3 is identified as itself with 100%; 2% of pattern 1 is identified as pattern 4; 2% of pattern 2 is identified as pattern 1, 26% of pattern 2 is identified as pattern 4; 4% of pattern 4 is identified as pattern 1, 54% of pattern 4 is identified as pattern 2, and 6% of pattern 4 is identified as pattern 5; 18% of pattern 5 is identified as pattern 1, 9% of pattern 5 is identified as pattern 2 and 23% of pattern 5 is identified as pattern 4.

To validate the feature learning ability of the CNN model in diagnosing multi-faults of axial piston pumps, three-dimensional scatter plot is employed to describe the learning results through the three representative stages.

Figure 12a1–a3 gives vivid description of the pump fault feature learning processes of the MED–CNN. Figure 12a1 shows the features of the pump data after the MED filtering. The fault features are comparably visible even with large intra-category distances after the MED filtering process. Features of the five patterns of axial piston pump are learned by C1-S1 operations. As shown in Fig. 12a2, fault patterns are explicitly observed using the learned features. Different from the results of the second stage, features are further learned with smaller intra-category distances and larger inter-category distances, which can be seen in Fig. 12a3.

Fig. 12
figure 12

The feature learning processes in experimental investigation

Similarly, the pump fault feature learning processes of the traditional CNN model are also illustrated by the three stages. As shown in Fig. 12b1–b3, the features are learned constantly but still failed to describe for some individual pattern except for pattern 3. The traditional CNN has limited ability to describe the explicit features of the axial piston pump data.

Table 10 shows the comparison results in the case study. The architecture of the SAE here is 400-200-100-80-5, learning rate is 0.05, number of pre-training epochs is 10, and number of fine-tuning epochs is 500 (the same as in simulation verification). The results also demonstrate the effectiveness of the present MED–CNN.

Table 10 Comparison results in experimental investigation

To further evaluate the effectiveness and robustness of the MED–CNN to detect multi-faults in the axial piston pump, thirty trails are conducted, which is the same as in the simulation verification section.

Figure 13 shows the average accuracy ratio of axial piston pump in the 30 trails. Similar to the simulation verification, the maximum iteration number is chosen as 500, 400, 300, 200 and 100, respectively. Meanwhile, the batch size is given as 30, 25, 20, 15, 10 and 5. The detailed accuracies in the 30 trails are listed in Tables 11 and 12, respectively.

Fig. 13
figure 13

The average accuracy ratio of the 30 trails in experimental investigation

Table 11 The average accuracy ratio of the 30 trails based on MED-CNN in experimental investigation
Table 12 The average accuracy ratio of the 30 trails based on traditional CNN in experimental investigation

As shown in Fig. 13, the MED–CNN gives a higher and steadier classification accuracy ratio compared with the traditional CNN. With the increase in iterations, the classification accuracies tend to be steadier with high values, which illustrate the effectiveness and robustness of the MED–CNN to detect multi-faults in axial piston pumps. The MED-CNN with seven hidden layers in the case study is also tested whose results are given in Table 13.

Table 13 Results of MED-CNN with 7 hidden layers in experimental investigation

6 Conclusion

Faults occurred in the piston pumps are difficult to be detected due to the complex working environment in hydraulic systems. In order to get rid of feature selection expertise, this paper proposed a MED based CNN model to automatically detect faults in axial piston pump. Both simulations and experiments are conducted to investigate the performance of the present model by comparing with the traditional CNN model. Using the present model, the average classification accuracy ratios for benchmark data with 12 classes of bearing operating states from the CWRU and for experimental data with five classes of commonly occurred faults from the experimental axial piston pump, are 97.33% and 100%, respectively. However, the average classification accuracy ratios for the traditional CNN model only attain 34.50% and 71.20%, respectively. In addition, the superiority for classification robustness of the present model is illustrated by thirty trails under different iteration numbers and batch sizes. Therefore, the proposed model can learn effective fault features with comparable satisfactory results in multi-fault classifications and is expecting to classify faults in more complex mechanical systems.