Abstract
For the bearing fault diagnosis in small sample cases, a new model for signal denoising and entropy feature fusion (EFF) based on the wild horse optimizer (WHO) optimized variational mode decomposition (VMD) and correlation coefficient weight threshold (CCWT) is proposed (WHO–VMD–CCWT–EFF). For signal denoising, we first take the power spectrum entropy as the fitness function, and the WHO is used to optimize VMD parameters. Secondly, IMFs with correlation coefficient values less than 0.2 are removed and the correlation coefficient values as weights are applied to the corresponding IMF components, and then reconstruct them. Then, the refined composite multiscale dispersion entropy (RCMDE), refined composite multiscale fluctuation dispersion entropy (RCMFDE), refined composite multivariate generalized multiscale fuzzy entropy (RCmvMFE), refined composite multivariate generalized multiscale sample entropy (RCmvMSE), and multiscale permutation entropy (MPE) of the signal are calculated and fused. Finally, the Fisher discriminant classifier is used as the model for fault diagnosis. The proposed model achieves an accuracy of over 99% in 12 single working conditions and 30 multiple working conditions experiments using the case western reserve university (CWRU) dataset and the Paderborn dataset. Compared with existing feature fusion methods, the WHO–VMD–CCWT–EFF model only integrates five selected features, and can achieve accurate diagnosis of bearing faults in small sample experiments with 42 different artificial and real damages. This indicates that the model has good generalization ability between different datasets and working conditions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In recent years, the issues related to health monitoring in the mechanical industry urgently need to be addressed. Some scholars have begun to spend time monitoring the health status of structures such as beams and trusses [1, 2]. Of course, due to the development of automated machinery and equipment, people are increasingly interested in fault diagnosis of their components, such as bearings [3,4,5]. At present, bearing fault diagnosis mainly includes three categories: signal processing and analysis, traditional fault diagnosis methods based on feature extraction, and self-extraction feature diagnosis methods based on deep learning [6]. In the actual research process, the three types of methods play their respective advantages and complement each other. From the perspective of signal processing, Li y et al. proposed a new time–frequency analysis (TFA) post-processing algorithm called local maximum high order time iterative synchrosqueezing (LHTIS) [7], and proved the effectiveness of this method by analyzing and processing fault signals. Haiyang Pan et al. [8] proposed multi-class fuzzy support matrix machine and successfully applied it to roller bearing fault diagnosis. In one study, VMD parameters and kernel fuzzy c-means (KFCM) were optimized respectively, and then the bearing fault types of small samples were identified [9]. Another study proposed a new method for bearing fault diagnosis based on wavelet packet transform and convolutional neural network optimized by simulated annealing algorithm [10]. Another study proposed ensemble self-taught learning convolutional auto-encoders (STL-CAEs) [11], which can effectively solve the problem of few labeled data. The method of combining dempster-shafer (DS) evidence theory with support vector machines (SVM) has also appeared in other bearing fault diagnosis research [12]. A new fault diagnosis method called RSG is proposed in the literature [13]. Yang S et al. [14] combined two-dimensional convolutional neural network (2DCNN) feature extractor and random forest (RF) classifier to establish a fault diagnosis model for the problem of high-speed bearings in offshore wind turbines. And the experimental results were 99.5% when 700 training samples and 300 test samples were input to the model. Literature [15] intercepted 30 training samples and 30 test samples respectively, and then calculated and mixed the time-domain and frequency-domain features of the samples. Finally, the deep neural network was used to identify the fault type with an accuracy of 99.1%. In [16], the weighted signal difference average (WSDA) as a new fitness function was proposed to optimize VMD, and a one-dimensional neural network was used for rolling bearing fault diagnosis. In the experiment with 5000 samples as the training set and 1000 samples as the test set, the accuracy of bearing fault diagnosis is 99.6%. In [17], the few-shot learning method was successfully applied to the fault diagnosis of rolling bearings, and the experimental verification was carried out under the mixed working conditions. The results show that when the number of training samples is 60, 200, 900, and 19,800, and the number of test samples is 75, the accuracy rates are 82.8%, 94.32%, 98.55%, and 99.77%, respectively.
It can be seen from previous research that deep learning is widely used in the field of bearing fault diagnosis. Of course, this is also due to its advantages in feature extraction and classification. However, the accuracy of deep learning is often at the cost of increasing the number of training samples. In addition, some studies are only carried out for a single working condition, and the effect of applying the proposed model to other working conditions remains to be verified. Based on the above analysis, the WHO-VMD-CCWT-EFF is proposed in this paper. In order to verify the usability and universality of the model, the CWRU dataset [18] and Paderborn dataset [19] are used for various single and multiple working condition experiments. The experimental results indicate that WHO-VMD-CCWT-EFF can achieve good results with only 10 training samples and 90 test samples. The main contributions of this paper are summarized as follows:
-
1.
A correlation coefficient weight threshold denoising method is proposed to denoise the fault signal decomposed by VMD.
-
2.
To extract better classification features, an entropy feature fusion method is proposed and the new bearing fault diagnosis method named WHO-VMD-CCWT-EFF is verified in the experiment.
-
3.
A new deviation metric is used to measure the stability of the model and validated in various experiments.
-
4.
On the basis of completing the experiment of single working condition, the experiment of mixing multiple working conditions is carried out, and good results are obtained. And the WHO-VMD-CCWT-EFF is still applicable and stable in the case of a small amount of data.
The rest of this paper is organized as follows: Section 2 introduces the theoretical basis related to the model. Section 3 introduces the framework of the model. The CWRU dataset and the Paderborn dataset are used for the experiments and analyses in Sect. 4. Finally, Sect. 5 gives the conclusion.
2 Theoretical backgrounds
2.1 WHO-VMD
VMD [20] is a non-recursive and adaptive signal processing algorithm proposed based on algorithms such as empirical mode decomposition (EMD). It aims to dissect the original signal in the frequency domain and decompose it into intrinsic mode function (IMF) with limited bandwidth and center frequency. WHO is a meta-heuristic optimization algorithm proposed by Iraj Naruei [21]. Similar optimization algorithms, such as the improved grey wolf optimization (IGWO), ant lion optimizer (ALO), and marine predator algorithm (MPA), have also been well applied to various structural detection. This article selects the WHO to optimize VMD parameters [22,23,24].
The algorithm is mainly inspired by the special behavior of wild horses that is different from other animals, that is, foals leave their parent groups before puberty to join other parent groups to avoid mating between relatives. In addition to mating behaviors, wild horses renew their position through social behaviors such as grazing behavior, group leadership, and exchange and selection of leaders.
In the process of VMD decomposition of bearing fault signal, it is found that the number of IMFs in VMD, that is, the number of decomposed layers \(k\) and the value of multiplication factor α will directly affect the decomposition effect. To select the relatively optimal parameters, the parameter \( \left( {k,\alpha } \right)\) of VMD is optimized by WHO, as shown in Fig. 1.
-
Step1: Set the parameters of WHO. The total number of wild horses N = 30, the maximum number of iterations Max_iter = 30, the crossover ratio PC = 0.13, the percentage of stallions in the group population PS = 0.2, the number of stallions Nstallion = N*PS, the number of foals in each group Nfoal = (N-Nstallion)/Nstallion. The parameters to be optimized \(\alpha \in \left[ {100,{ }2000} \right]\), \(k \in \left[ {4,{ }8} \right]\), \(\alpha ,k \in Z\).
-
Step2: Create populations, select leaders, and calculate fitness function values.
-
Step3: Search and update according to grazing behavior if Rand > PC, otherwise update by mating behavior. It should be noted that Rand is a random number with uniform distribution in the range [0, 1].
-
Step4: Group leaders as well as stallions are updated, respectively.
-
Step5: Determine whether the number of iterations is reached, if so, output \(\left( {k,\alpha } \right)\), otherwise return to step 3.
2.2 Power spectrum entropy
The power spectral entropy [25] represents the uncertainty of signal energy under power spectral partitioning, which is a quantitative description of the complexity of signal energy distribution in the frequency domain. In actual industrial environments, bearing fault signals are collected in environments with different noise sources, resulting in complex frequency components of bearing fault signals. In order to effectively reflect the fault characteristics contained in each frequency component, the power spectrum entropy is used as the fitness function of the optimization algorithm. When the entropy value of the power spectrum is small, the frequency components in the signal are simple, and the power spectrum is concentrated on some frequency components, which can reflect the characteristics of the fault signal. In addition, the power spectral entropy values of each IMF component in the same fault state after VMD decomposition are relatively stable, and the power spectral entropy values vary in different fault states. This further proves that the power spectral entropy is suitable as the fitness function of the optimization algorithm.
Step1: Define the original fault signal sequence as \(x\left( t \right) = \left\{ {x\left( 1 \right),x\left( 2 \right),x\left( 3 \right), \ldots x\left( L \right)} \right\}\)
where \(L\) is the length of the signal, \(P\left( i \right)\) is the power spectrum of the signal. \(x\left( w \right)\) is the Fourier transform of the signal.
Step 2: Obtain the power spectral density distribution function by normalization:
where \(N\) is the number of frequency components in the Fourier transform.
Step 3: Define the power spectrum entropy through the power spectral density distribution function as:
2.3 Correlation coefficient
The correlation coefficient is a description of the similarity between two random signals or deterministic signals. After the VMD decomposition of the bearing fault signal, the correlation between each IMF and the original bearing fault signal can be judged by calculating the correlation coefficient value. Then, it can be inferred from the correlation coefficient whether the IMF contains the main features of the original signal. Generally speaking, the closer the absolute value of the correlation coefficient is to 1, the higher the degree of correlation between the two, and the more obvious the features of the original signal contained in the IMF. The correlation coefficient \(R_{k}\) between the k-th IMF and the original signal is defined as:
where \(f\left( t \right)\) is the original signal and \(u_{k} \left( t \right)\) is the k-th IMF. E and D represent expected values and variance.
2.4 Entropy features
2.4.1 RCMDE
RCMDE [26, 27] was first proposed and applied to biomedical signals in 2017. RCMDE is improved by multi-scale and coarse-graining based on dispersion entropy. The specific calculation steps are as follows:
Step 1: For the sequence X of length \(N\), divide it into segments of length τ. The average value of each segment is calculated and arranged to obtain a coarse-grained sequence.
where \(x_{k}^{\tau } = \left\{ {x_{k,1}^{\tau } ,x_{k,2}^{\tau } ,...} \right\}\) is the k-th coarse-grained sequence at the τ scale.
Step 2: Map the time series \(x_{k,j}^{\tau }\) to \(y_{k,j}^{\tau }\) by Eq. (6)
where \(u\) represents the mathematical expectation of sequence \(y_{k}^{\tau }\), and \(\sigma \) represents the variance of sequence \(y_{k}^{\tau }\).
Step 3: Map the time series \(y_{k}^{\tau }\) to \(Z_{j}^{c}\) by Eq. (7)
where \({\text{Round}}()\) represents the rounding function, and \(c\) represents the number of categories.
Step 4: Calculate the embedding vector by Eq. (8).
where \(m\) is the embedding dimension and \(d\) is the time delay.
Step 5: Calculate the dispersion patterns and its corresponding probability. Assuming that \(z_{i}^{c} = v_{0}\), \(z_{i + d}^{c} = v_{1}\), and \(z_{{i + \left( {m - 1} \right)d}}^{c} = v_{m - 1}\), the dispersion pattern corresponding to \(z_{i}^{m,c}\) is \(\pi_{{v_{0} v_{1} \cdots v_{m - 1} }}\). Calculate the probability corresponding to the dispersion pattern according to Formula (9).
Step 6: Calculate the average value \(\overline{p}\left( {\pi_{{v_{0} v_{1} \cdots v_{m - 1} }} } \right)\) of the probability of the dispersion pattern, and obtain the RCMDE value through \(\overline{p}\left( {\pi_{{v_{0} v_{1} \cdots v_{m - 1} }} } \right)\).
2.4.2 RCMFDE
RCMFDE is based on the study of Azami H et al. [28, 29] for dispersion entropy. The fluctuation dispersion entropy is superior to the dispersion entropy in that it takes into account the volatility of the time series while maintaining stable performance and less computation. Like RCMDE, RCMFDE obtains the dispersion pattern by formula (5–8). Calculate the probability corresponding to the dispersion pattern according to Formula (11).
Among them, \({\text{count ()}}\) is the number of maps from \(z_{i}^{m,c}\) to \(\pi_{{v_{0} v_{1} \cdots v_{m - 1} }}\).
Calculate the average value of the dispersion pattern probabilities at scale τ, and the RCMFDE is obtained through \(\overline{p}\left( {\pi_{{v_{0} v_{1} \cdots v_{m - 1} }} } \right)\).
2.4.3 RCmvMFE
RCmvMFE [30] is a tool proposed in 2017 to analyze the complexity of multi-channel signals. The detailed description of RCmvMFE is as follows:
-
Step 1: For a multivariate signal \(Y = \{ y_{k,b} \}_{,b = 1}^{C}\) containing \(p\) signals with a length of \(C\), the coarse-grained operations are performed to obtain a time series, represented as \(z_{\alpha }^{\left( \beta \right)} = \left\{ {x_{\alpha ,k,i}^{\left( \beta \right)} } \right\}\), where \( \beta\) is the time series scale.
$$ x_{\alpha ,k,i}^{\left( \beta \right)} = \frac{1}{\beta }\mathop \sum \limits_{{b = \left( {i - 1} \right)\beta }}^{i\beta + \alpha - 1} y_{k,b} \quad 1 \le i \le \left \lfloor \frac{C}{\beta } \right \rfloor = N,1 \le k \le p,1 \le \alpha \le \beta $$(14) -
Step 2: The multivariate embedded reconstruction is used.
$$ X_{m} \left( i \right) = \left[ {x_{1,i} ,x_{{1,i + \tau_{1} }} , \ldots ,x_{{1,i + \left( {m_{1} - 1} \right)\tau_{1} }} ,x_{2,i} ,x_{{2,i + \tau_{2} }} , \ldots ,x_{{2,i + \left( {m_{2} - 1} \right)\tau_{2} }} , \ldots ,x_{P,i} ,x_{{P,i + \tau_{P} }} , \ldots ,x_{{P,i + \left( {m_{P} - 1} \right)\tau_{P} }} } \right] $$(15)where \(M = \left[ {m_{1} ,m_{2} ,. . .m_{p} } \right]\), \(\tau = \left[ {\tau_{1} ,\tau_{2} ,. . .\tau_{P} } \right]\) are the embedding dimension and delay time, respectively, \(n = {\text{max}}\left\{ M \right\} \times {\text{max}}\left\{ \tau \right\}\), \(i = 1,2, . . .N - n\).
-
Step 3: Calculate the distance between \(X_{m} \left( i \right)\) and \(X_{m} \left( j \right)\), where \(i \ne j\).
$$ d\left[ {X_{m} \left( i \right),X_{m} \left( j \right)} \right] = \mathop {{\text{max}}}\limits_{l = 1,2, \ldots ,m} \left\{ {\left| {x\left( {i + l - 1} \right) - x\left( {j + l - 1} \right)} \right|} \right\} $$(16) -
Step 4: According to the given threshold r and fuzzy membership function \(\theta \left( {d,r} \right)\), \(\phi^{m} \left( r \right)\) with the embedding dimension \(m\) can be obtained:
$$ \theta \left( {d,r} \right) = \exp \left( {\frac{{ - (d)^{fp} }}{r}} \right) $$$$ \phi^{m} \left( r \right) = \frac{1}{{\left( {N - n} \right)}}\mathop \sum \limits_{i = 1}^{N - n} \frac{{\mathop \sum \nolimits_{j = 1,i \ne j}^{N - n} \exp \left( {\frac{{ - (d\left[ {X_{m} \left( i \right),X_{m} \left( j \right)} \right])^{fp} }}{r}} \right)}}{N - n - 1} $$(17) -
Step 5: Let m = m + 1 and repeat steps 2–4. Calculate the average values \(\overline{\phi }_{\beta ,\alpha }^{m}\) and \(\overline{\phi }_{\beta ,\alpha }^{m + 1}\) of Eq. (17). Then RCmvMFE can be calculated by Eq. (18)
$$ {\text{RCmvMFE}}\left( {Y, \beta ,M,n,r} \right) = - {\text{ln}}\left( {\frac{{\overline{\phi }_{\beta ,\alpha }^{m + 1} }}{{\overline{\phi }_{\beta ,\alpha }^{m} }}} \right) $$(18)
2.4.4 RCmvMSE
The probability calculation method for RCmvMSE [31] varies when the embedding dimension is \(m\).
Let m = m + 1, and repeat the above steps to get \(B^{m + 1} \left( r \right)\). Calculate the mean values \(\overline{B}_{\beta ,\alpha }^{m}\) and \(\overline{B}_{\beta ,\alpha }^{m + 1}\) in \(m\) and \(m + 1\) dimensions. RCmvMSE can be calculated by Eq. (21)
2.4.5 MPE
To better study and analyze the dynamic characteristics of EEG, Ouyang G et al. [32] proposed a multiscale permutation entropy based on permutation entropy.
Step 1: A new time series is obtained by coarse-graining an original sequence Y of length N, where \(\tau\) is the scale factor.
Step 2: The phase space reconstruction is applied with \( y^{\left( \tau \right)}\) to obtain the time series \(X_{i}\).
where \(m\) is the embedding dimension and \(\lambda\) is the delay time.
Step 3: The \(X_{i}\) is sorted in ascending order to generate a sequence of position indexes. For any kind of \(X_{i}\), there are \(m!\) permutations. The probability of each permutation is calculated according to Eq. (24).
where \(T\left( \omega \right)\) is the number of occurrences of permutation \(\omega\), \(1 \le \omega \le m!\)
Step 4: Define the multiscale permutation entropy by Eq. (25).
2.5 Deviation
In order to measure the stability of the model, a new deviation indicator is defined. Suppose that for experiment A, the result of the i-th repeated experiment is \(A_{i}\), i = 1,2,3……N.
where \({\text{max}}\left[ {A_{i} } \right]\) is to find the maximum value of \( A_{i}\), \({\text{min}}\left[ {A_{i} } \right]\) is to find the minimum value of \(A_{i}\).
3 The WHO–VMD–CCWT–EFF
The framework of the WHO–VMD–CCWT–EFF model is shown in Fig. 2. Three parts of fault signal denoising, feature extraction and fusion, and feature classification are included in this method.
Denoising: First, the VMD optimized by the WHO algorithm decomposes various bearing fault signals into IMFs; Secondly, the correlation coefficients between each IMF and the original bearing fault signal are calculated; Then, selecting IMFs with correlation function values greater than 0.2 with the original bearing signal; Finally, the correlation coefficients are used as the weight to multiply the corresponding IMFs to reconstruct the fault signal.
Feature extraction: RCMDE, RCMFDE, RCMvMFE, RCMvMSE, and MPE are extracted from the fault signal after denoising, and then the five entropy features extracted are fused.
Classification: The feature samples of the fault signals are divided into training and test sets according to the experimental requirements, and then the fault signals are classified by Fisher classifier.
4 Experimental analyses
In order to verify the effectiveness of the method proposed in this paper, two classical public datasets are used in the experiments.
4.1 Analysis of bearing fault signal
4.1.1 WHO–VMD
To address the issue of VMD decomposition being greatly affected by parameters \(\left( {k,\alpha } \right)\), the WHO is used to optimize the parameters. Figure 3 shows the convergence curve of partial artificial damage and real damage under the Paderborn dataset. It can be seen that the number of iterations required to achieve convergence for each fault signal is inconsistent. Therefore, in order to consider as many fault signals as possible, 30 is chosen as the number of iterations. Similarly, the number of search agents is set to 30. From Fig. 4, it can be seen that when the number of search agents is 30, the convergence of the fault signal is better.
Taking the 0.3556 mm outer race bearing fault at 0HP as an example, the fitness curve of the WHO-VMD is shown in Fig. 5. As can be seen from the Fig. 5, the value of the fitness function for the first iteration is \(7.2145 \times 10^{ - 4}\). The value of the fitness function is \(7.0273 \times 10^{ - 4}\) after a small decrease, and tends to be smooth after two iterations. The fitness function value is reduced to \(5.66 \times 10^{ - 4}\) after the fifth iteration, and to a minimum value of after the 14th iteration. The results show that WHO has a fast convergence rate in the process of VMD parameter optimization, which proves that WHO is suitable for optimizing the parameters of VMD. The parameters obtained by WHO-VMD in this experiment is \(k = 5,\,\alpha = 1106\).
To verify the superiority of the WHO algorithm in optimizing the VMD parameters, the particle swarm optimization algorithm (PSO) [33], the whale optimization algorithm (WOA) [34], and the moth-flame optimization algorithm (MFO) [35] are used to optimize the VMD parameters. The number of optimization algorithm populations is set to 30, and the maximum number of iterations is 30, and then the fitness function convergence curve shown in Fig. 6 is obtained. As can be seen from Fig. 6, the convergence curve of PSO is unstable and the phenomenon of sudden high and low appears. The WOA and MFO converge after the 2nd and 6th iterations, respectively, corresponding to a power spectrum entropy function value of \(7.0273 \times 10^{ - 4}\). It can be seen that WOA and MFO are faster in finding the optimum, and convergence can be achieved in fewer iterations. In contrast, WHO can reach the fitness termination value of WOA and MFO after the 2nd iteration, and still continue to iterate to \(5.646 \times 10^{ - 4}\) thereafter. This proves the superiority of WHO in optimizing VMD parameters.
4.1.2 CCWT
After determining the VMD parameters, the fault signal is decomposed to obtain IMF and denoised through CCWT. The correlation coefficient about IR and OR in the Paderborn real damage D2 dataset as shown in Fig. 7. When 0.3 is chosen as the denoising threshold for CCWT, half of the IMFs are filtered, which may lead to excessive denoising and loss of otherwise useful information. When 0.1 is chosen as the denoising threshold of CCWT, no IMFs are filtered, and the expected denoising effect cannot be achieved. In the paper, 0.2 is chosen as the denoising threshold of CCWT. The IMFs with correlation coefficients less than 0.2 are filtered out first, and then the correlation coefficients of the remaining are used as the weighting coefficients of the IMFs to reconstruct the original signal.
Taking the OR (Label = 3) of the Paderborn real damage D4 dataset as an example. For easy observation, 500 sample points are selected to compare the differences before and after denoising, as shown in Fig. 8. When correlation coefficient threshold (CCT) is used, IMFs whose correlation coefficient value is less than 0.2 are removed and the remaining components are reconstructed. Based on the former, the CCWT applies the correlation coefficient values greater than 0.2 to the corresponding IMF components as weights. As can be seen from Fig. 8, compared with the signal of CCT denoising, the signal curve of CCWT denoising is smoother and has less burrs. Therefore, we think the denoising effect of CCWT is better.
In addition, although the threshold is set to 0.2 considering the characteristics of most fault signals. Due to the large number of fault signals in both datasets, there are still some signals with all correlation coefficient values greater than 0.2. Taking the IR (Label = 8) of the Paderborn real damage D1 dataset as an example, the denoising effects of CCT and CCWT on such situations are explored. As can be seen from the Fig. 9, the signal after CCT denoising is almost indistinguishable from the original signal because all the correlation coefficient values are greater than 0.2. CCWT not only effectively avoids this defect, but also achieves good denoising effect. This is because the weighting operation on IMF not only enhances the useful signal, but also weakens the noisy signal. The larger the correlation coefficient between IMF and the original signal during CCWT denoising, the more useful information the signal contains. On the contrary, the smaller the value of correlation coefficient, the more the IMF is considered to contain noise. The operation of using the correlation coefficient values as weights is equivalent to amplifying the IMF that is considered to contain useful information and shrinking the IMF that contains noise. Therefore, the CCWT method is considered not only enhance the useful signal but also weaken the noisy signal.
4.2 The CWRU dataset
4.2.1 The CWRU dataset description
The experimental data in this paper comes from the rolling bearing test stand shown in Fig. 10. The 6205-2RS JEM SKF deep groove ball bearing is used as the test bearing, and the data is collected under four loads of 0HP, 1HP, 2HP, and 3HP, and the sampling frequency is 12 kHz. Three damage faults made by electro-discharge machining (EDM), namely inner race fault, outer race fault, and ball fault, are included in the experiment. Each fault includes three different degrees of damage with diameters of 0.1778 mm, 0.3556 mm, and 0.5334 mm, as shown in Table 1. In the experiment, 100 samples are intercepted for each fault signal without overlap.
4.2.2 Experimental analysis of single working condition bearing fault diagnosis
After feature extraction of bearing fault signal, Fisher classifier is used to classify the fused features. The number of samples for each class of fault signals in the experiment is set to 100, and then the training set and test set are divided according to different proportions. Each result is the average of ten replicates. In order to verify the performance of the selected classifiers, three classifiers are selected for experiments with training samples and test samples at different ratios. As shown in Fig. 11, the accuracy of all three classifiers under the four working conditions shows an increasing trend when the ratio of training samples to test samples is larger. The best performance for the decision tree classification is achieved with 2HP data, while the SVM has higher accuracy under 1HP. In Fisher classifier, except for the slightly lower performance of 0HP when the ratio of training samples to test samples is 1:9, the accuracy of other working conditions exceeds 99%. When the ratio of training samples and test samples is 1:9, the Fisher classifier is improved by 4.5–5.4% compared with the decision tree classifier, and 3.2–4.63% compared with the SVM classifier. It is clear that the Fisher classifier still shows the superior performance when the number of samples is small, which is what we would like to see.
For the bearing fault diagnosis method, superior performance is the primary requirement, but the stability of the model is also crucial. To further illustrate the stability of the WHO-VMD-CCWT-EFF, the results of 10 experiments for four working conditions are recorded as shown in Fig. 12. The difference between the maximum and minimum values in 10 experiments are used as deviations to measure the stability of the WHO-VMD-CCWT-EFF. It can be seen from the Fig. 12 that when the ratio of training samples to test samples is 1:9, the deviation under 0HP is 2.23%, which is the largest deviation under the four working conditions. At the training sample to test sample ratio of 2:8, the deviation of the 0HP and 1HP are 0.25%, and the deviation of the 2HP and 3HP are 0. This indicates that the data under 0HP with few training samples is slightly less stable compared to the data under other working conditions. But when the proportion of training samples is slightly larger under the premise of small samples, the data under 0HP still shows very good performance.
Table 2 presents the experimental results of the four working conditions under different ratios of training samples and test samples. It can be seen that under the four working conditions, the Fisher classifier not only has higher accuracy but also maintains the smallest deviation, which further verifies the effectiveness and stability of the WHO-VMD-CCWT-EFF. At the same time, we can see that the accuracy of the other two classifiers reaches over 99% when the ratio of training samples to test samples is 9:1 and remains around 95% when the ratio of training samples to test samples is 1:9. This indicates that denoising and feature extraction are successful, for which feature extraction will be analyzed in detail later.
4.2.3 Experimental analysis of bearing fault diagnosis under multiple working conditions
Since the actual industrial environment is complex and changeable, it is impossible to ensure that the collected data are always under the same working conditions. So, a variety of bearing fault diagnosis experiments and analysis under multiple working conditions are carried out. The CWRU dataset includes four working conditions, which means that six mixed experiments of two working conditions and four mixed experiments of three working conditions are included in the experiments. Like the single working condition experiment, the training set and the test set are divided according to different proportions. In addition, each experiment is repeated ten times, and the average value is taken as the final experimental result. Details of the experimental data for the two working conditions and the three working conditions are described in detail in Tables 3 and 4 respectively (0HP + 1HP and 0HP + 1HP + 2HP as an example).
Figure 13 shows the experimental results under multiple working conditions. Compared with Fig. 11, it can be seen that the WHO-VMD-CCWT-EFF performs better in the multiple working conditions experiment. To explain this phenomenon, the experiments shown in Fig. 14 are performed. It is not difficult to see that the accuracy rates of the three classifiers under multiple working conditions are almost always higher than those under a single working condition. Therefore, we can conclude that the phenomenon is not related to the classifier and may be since the extracted features of the same fault in different working conditions are relatively similar or the experimental samples of multiple working conditions are increased.
To verify whether this phenomenon is related to the increase in the number of samples, experiments are carried out with the combination 0HP + 1HP as an example. Of course, the number of training samples and test samples for the selected combinations is the same as for the experiments under a single working condition. Table 5 presents the experimental data in detail.
Figure 15 shows the experimental results, the difference between the accuracy of fisher classifier in both cases is 0–0.41%, and the difference between decision tree and SVM is 0–2.9%. It is not difficult to draw a conclusion that for some classifiers, an increase in sample size has a certain impact on accuracy. But the proposed model almost overcomes this shortcoming. For further validation, experiments as shown in Fig. 16 are performed. The number of samples for the two working conditions and three working conditions in the figure are shown in the Tables 3 and 4. In Fig. 16, the ratio of training samples to test samples for each experiment is 1:9. From the figure, the accuracy rate of some three working conditions experiments are higher than that of two working conditions, while the rest are lower than that of two working conditions. And the accuracy rate of four working conditions experiments is the lowest. This indicates that for the proposed model, the increase of samples does not necessarily lead to the improvement of fault diagnosis rate.
4.3 Paderborn university dataset
4.3.1 Dataset description
The Paderborn dataset is proposed by Christian Lessmeier et al. of the Kat-Data Center, and the 6205 deep groove ball bearing is used as the test bearing for the collection of the artificial damage dataset and the real damage dataset. The rolling bearing test stand is shown in Fig. 17. As with the CWRU data, Paderborn dataset is collected under each of the four working conditions, as shown in Table 6. As shown in Tables 7 and 8, the artificial damage dataset contains a total of 8 faults, while the real damage dataset contains 9 faults. The bearing damage for both data sets are inner ring (IR) and outer ring (OR). For each type of fault signal, 100 samples are intercepted without overlapping for the experiment.
4.3.2 Experimental analysis of bearing fault diagnosis in single working condition
Whether the fault types of bearings can be diagnosed efficiently and accurately depends largely on the feature extraction. And the classification effect will be better if the extracted features of fault are more obvious. To confirm the advantages of the feature extraction method in this paper, the features of the artificial damage and the real damage under four working conditions are visualized.
In Fig. 18a–d and e–h are the feature visualization plots for four single working conditions under artificial damage and real damage data, respectively. As shown in Fig. 18, D3 performs the best in the feature visualization of the artificial damage, which means that D3 is more successful in feature extraction. In contrast, D1 performs relatively poorly, which explains the lower accuracy of D1 compared to other data in Table 9. Similarly, the same situation occurred with the real damage experiment. This shows that the relatively low accuracy of D1 compared to other data may be due to the acquisition conditions.
The experimental results of artificial damage and real damage are given in Tables 9 and 10, respectively. The accuracy rate is relatively low and the deviation is large when the ratio of the training sample to the test sample is low. With the increase of the ratio, all the data reaches 100% except for the real damage D1 which reaches 99.77%. This shows that the increase of the ratio of the training sample to the test sample can enhance the experimental effect. In addition, the real damage D2 and D3 reach 100% when the ratio of training samples to test samples is 2:8, and D4 reaches 100% at 3:7. This indicates that even with only a small number of samples, the WHO-VMD-CCWT-EFF can identify the type of bearing fault.
4.3.3 Experimental analysis of bearing fault diagnosis under multiple working conditions
As with the CWRU data, the experiments on artificial damage and real damage under multiple working conditions are also carried out after completing the experiments of single working conditions. Here the experimental data for the artificial damage dataset are analyzed specifically, and the same is true for the real damage dataset. The experimental data for the two working conditions and the three working conditions are described in detail in Tables 11 and 12, respectively.
As shown in Tables 13 and 14, which are the results of the experiments on artificial damage and real damage under multiple working conditions, respectively. For both artificial damage and real damage, the accuracy rate reaches more than 99% when the ratio of training sample to test sample is 1:9. Compared to a single working condition, the accuracy of multiple working conditions is higher. This is consistent with the conclusions obtained from the CWRU data. In addition, when the ratio of training samples to test samples is 1:9, the deviation is relatively large, but it improves at 2:8 and then fluctuates as the ratio increases. This reminds us that the selection of training to test sample ratio is a very important when experimenting with multiple working conditions.
To verify the effectiveness of the entropy fusion method proposed in this paper, the experiments of single entropy and fusion entropy are carried out respectively. 9 entropies are selected for the experiments in Tables 15 and 16, which are RCMDE, RCMFDE, RCMvMFE, RCMvMSE, MPE, multiscale dispersion entropy (MDE), multiscale weight permutation entropy (MWPE), multivariate fuzzy entropy (MVFE), and multivariate sample entropy (MVSE). Among them, MVFE and MVSE are single-scale entropy, and the remaining seven are multiscale entropy. Taking the multiple working conditions experiment as an example, the experimental effect of multiscale entropy both in artificial damage and real damage far exceeds that of single-scale entropy. In order to improve the accuracy and reduce deviations of the model, an entropy fusion method is proposed.
For the fusion of entropy features, the main goal is to improve the accuracy and minimize the deviation. According to Tables 15 and 16, the better-performing entropy features are fused in turn to obtain the experimental results of the artificial damage fusion entropy features in Table 17 and the experimental results of the real damage fusion entropy features in Table 18. Comparing Table 17 with Table 18, we can see that the WHO-VMD-CCWT-EFF is more applicable to the real damage dataset, which is the direction of our efforts. In the entropy fusion, the experimental performance of D1, D2, and D4 is not the best, but the difference with the indicator of the best effect is slight.
To make the model more convincing, the experimental results of the three working conditions experiments of D1, D2, and D3 with different ratios of training samples to test samples are analyzed. The trend plots of entropy fusion experimental results for artificial and real damage at D1, D2 and D3 are given in Figs. 19 and 20, respectively. As the number of fused feature entropies increases the artificial damage accuracy increases and the deviation decreases. In the real damage, the performance of the 5 entropy fusions is not the best when the ratio of training samples to test samples is high. This shows that for the fusion of entropy features, it is not the more the better, but the category and number of entropy should be reasonably selected according to the experiment. In this paper, five entropy features are fused in the model.
4.4 Model comparison
As shown in Table 19, in order to verify the performance of the proposed method, we selected existing models from recent years for comparison. In bearing fault diagnosis, the most classic CWRU dataset is used as the dataset for model comparison. The comparison results indicate that the proposed model can identify bearing faults with higher accuracy when the training and testing data are the same as other literature. In addition, while maintaining the same level of accuracy, less data is used.
5 Conclusions
A bearing fault diagnosis method WHO-VMD-CCWT-EFF based on signal denoising and feature fusion is proposed to address the issues of low accuracy in traditional methods and the need for a large number of data samples in deep learning methods. This paper focuses on two aspects of bearing fault signal denoising and feature extraction. In order to verify the effectiveness and stability of the model, the CWRU dataset and Paderborn dataset are used for various single and multiple working experiments. The experimental results show that WHO-VMD-CCWT-EFF exhibits superior performance in both single and multiple working conditions when the training data of both datasets are small. The following experimental results confirm this conclusion.
-
(1)
The WHO-VMD-CCWT-EFF model can accurately identify the fault status of bearings. It can be proved by the fact that experiments on the CWRU dataset and the Paderborn dataset (12 single operating conditions and 30 multiple operating conditions) achieve over 99% accuracy.
-
(2)
In the Paderborn dataset, when the ratio of training samples to test samples is 1:9, the difference between real and artificial damage is 0.02% -1.44%. This indicates that even under small sample experimental conditions, the model has good stability and generalization ability.
-
(3)
The fusion entropy feature vector is an effective method for extracting bearing fault features. In the experiments on the CWRU dataset, in addition to the Fisher discriminator achieving an accuracy of over 98% in small samples, the accuracy of decision tree and SVM classification in small samples also reach over 93.5%, which proves this point.
-
(4)
Compared to the Paderborn dataset, the CWRU dataset performs better in the experiment. This indicates that differences in data from different devices can affect the performance of the model. Therefore, in the future, we will focus on researching cross equipment bearing fault diagnosis.
Data availability
The datasets analyzed during the current study are available in the Case Western Reserve University Bearing Data Center and Paderborn University Kat-Data Center Website repository.
References
Saadatmorad, M., Talookolaei, R.A.J., Pashaei, M.H., et al.: Pearson correlation and discrete wavelet transform for crack identification in steel beams. Mathematics 10(15), 2689 (2022)
Tiachacht, S., Khatir, S., Thanh, C.L., et al.: Inverse problem for dynamic structural health monitoring based on slime mould algorithm. Eng. Comput. 1–24 (2021).
Cheng, X.: The application of automation technology in mechanical design and manufacturing (2019)
Gültekin, Ö., Çinar, E., Özkan, K., et al.: A novel deep learning approach for intelligent fault diagnosis applications based on time-frequency images. Neural Comput. Appl. 34(6), 4803–4812 (2022)
Wang, X., Si, S., Li, Y.: Multiscale diversity entropy: a novel dynamical measure for fault diagnosis of rotating machinery. IEEE Trans. Ind. Inform. 99, 1–1 (2020)
Song, X., Liao, Z., Wang, H., et al.: Incrementally accumulated holographic SDP characteristic fusion method in ship propulsion shaft bearing fault diagnosis. Meas. Sci. Technol. 4, 33 (2022)
Li, Y., Shi, Z., Lin, T.R., et al.: An iterative reassignment based energy-concentrated TFA post-processing tool and application to bearing fault diagnosis. Measurement 193, 110953 (2022)
Pan, H., Xu, H., Zheng, J., et al.: Multi-class fuzzy support matrix machine for classification in roller bearing fault diagnosis. Adv. Eng. Inform. 51, 101445 (2022)
Chang, Y., Bao, G., Cheng, S., et al.: Improved VMDkgFCM algorithm for the fault diagnosis of rolling bearing vibration signals. IET Signal Process. 15, 238–250 (2021)
He, F., Ye, Q.: A bearing fault diagnosis method based on wavelet packet transform and convolutional neural network optimized by simulated annealing algorithm. Sensors 22(4), 1410 (2022)
Zhang, Y., Wang, J., Zhang, F., Lv, S., Zhang, L., Jiang, M., Sui, Q.: Intelligent fault diagnosis of rolling bearing using the ensemble self-taught learning convolutional auto-encoders. IET Sci. Meas. Technol. 16(2), 130–147 (2022)
Hui, K.H., Lim, M.H., Leong, M.S., et al.: Dempster-Shafer evidence theory for multi-bearing faults diagnosis. Eng. Appl. Artif. Intell. 57, 160–170 (2017)
Lyu, P., Zhang, K., Yu, W., et al.: A novel RSG-based intelligent bearing fault diagnosis method for motors in high-noise industrial environment. Adv. Eng. Inform. 52, 101564 (2022)
Yang, S., Yang, P., Yu, H., Bai, J., Feng, W., Su, Y., Si, Y.: A 2DCNN-RF model for offshore wind turbine high-speed bearing-fault diagnosis under noisy environment. Energies 15(9), 3340 (2022)
Muhammad, S., Cheol-Hong, K., Jong-Myon, K.: A hybrid feature model and deep-learning-based bearing fault diagnosis. Sensors 17(12), 2876 (2017)
Wang, Q., Yang, C., Wan, H., et al.: Bearing fault diagnosis based on optimized variational mode decomposition and 1D convolutional neural networks. Meas. Sci. Technol. 32(10), 104007 (2021)
Zhang, A., Li, S., Cui, Y., et al.: Limited data rolling bearing fault diagnosis with few-shot learning. IEEE Access 99, 1–1 (2019)
Case Western Reserve University Bearing Data Center Website http://csegroups.case.edu/bearingdatacenter/home.
Christian Lessmeier et al., Kat-Data Center: mb.uni-paderborn.de/kat/datacenter, Chair of Design and Drive Technology, Paderborn University.
Dragomiretskiy, K., Zosso, D.: Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014)
Naruei, I., Keynia, F.: Wild horse optimizer: a new meta-heuristic algorithm for solving engineering optimization problems. Eng. Comput. 38(Suppl 4), 3025–3056 (2022)
Ho, L.V., Nguyen, D.H., Mousavi, M., et al.: A hybrid computational intelligence approach for structural damage detection using marine predator algorithm and feedforward neural networks. Comput. Struct. 252, 106568 (2021)
Ho, L.V., Trinh, T.T., De Roeck, G., et al.: An efficient stochastic-based coupled model for damage identification in plate structures. Eng. Fail. Anal. 131, 105866 (2022)
Al Thobiani, F., Khatir, S., Benaissa, B., et al.: A hybrid PSO and Grey Wolf Optimization algorithm for static and dynamic crack identification. Theoret. Appl. Fract. Mech. 118, 103213 (2022)
Ji, Y., Wang, X., Liu, Z., et al.: EEMD-based online milling chatter detection by fractal dimension and power spectral entropy. Int. J. Adv. Manuf. Technol. 92(1781), 1–16 (2017)
Azami, H., Rostaghi, M., Abásolo, D., et al.: Refined composite multiscale dispersion entropy and its application to biomedical signals. IEEE Trans. Bio-Med. Eng. 99, 1–1 (2017)
Rostaghi, M., Azami, H.: Dispersion entropy: a measure for time-series analysis. IEEE Signal Process. Lett. 23(5), 610–614 (2016)
Azami, H., Escudero, J.: Amplitude-and fluctuation-based dispersion entropy. Entropy 20(3), 210 (2018)
Azami, H., Arnold, S.E., Sanei, S., et al.: Multiscale fluctuation-based dispersion entropy and its applications to neurological diseases. IEEE Access 7, 68718–68733 (2019)
Azami, H., Escudero, J.: Refined composite multivariate generalized multiscale fuzzy entropy: A tool for complexity analysis of multichannel signals. Physica A 465, 261–276 (2017)
Ahmed, M.U., Mandic, D.P.: Multivariate multiscale entropy analysis. IEEE Signal Process. Lett. 19(2), 91–94 (2012)
Ouyang, G., Li, J., Liu, X., et al.: Dynamic characteristics of absence EEG recordings with multiscale permutation entropy analysis. Epilepsy Res. 104(3), 246–252 (2013)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN'95-International Conference on Neural Networks. IEEE, vol. 4: pp. 1942–1948 (1995).
Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)
Mirjalili, S.: Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl. Based Syst. 89(Nov), 228–249 (2015)
Li, H., Huang, J., Ji, S.: Bearing fault diagnosis with a feature fusion method based on an ensemble convolutional neural network and deep neural network. Sensors 19(9), 2034 (2019)
Yang, J., Xie, G., Yang, Y., et al.: A multilevel recovery diagnosis model for rolling bearing faults from imbalanced and partially missing monitoring data. Math. Biosci. Eng. 20(3), 5223–5242 (2023)
Nie, G., Zhang, Z., Shao, M., et al.: A novel study on a generalized model based on self-supervised learning and sparse filtering for intelligent bearing fault diagnosis. Sensors 23(4), 1858 (2023)
Chang, Y., Bao, G., Cheng, S., et al.: Improved VMD-KFCM algorithm for the fault diagnosis of rolling bearing vibration signals. IET Signal Proc. 4, 15 (2021)
Zhao, J., Yang, S., Li, Q., et al.: A new bearing fault diagnosis method based on signal-to-image mapping and convolutional neural network. Measurement 176(1), 109088 (2021)
Acknowledgements
We especially thank for the fund of Shanxi ‘1331 Project’ Key Subject Construction and Innovation Special Zone Project, China.
Funding
This research are funded by the National Natural Science Foundation of China as National Major Scientific Instruments Development Project (Grant No.61927807) and National Natural Science Foundation of China (Grant No. 51875535, 61774137), the Fundamental Research Program of Shanxi Province, China (Grant No. 202103021224195, 202103021223189, 202103021224212, 20210302123019), Shanxi Scholarship Council of China (Grant No. 2020–104 and 2021–108) and The 18th Postgraduate Science and Technology Program of The North University of China (20221848).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, J., Bai, Y., Cheng, Y. et al. A new model for bearing fault diagnosis based on optimized variational mode decomposition correlation coefficient weight threshold denoising and entropy feature fusion. Nonlinear Dyn 111, 17337–17367 (2023). https://doi.org/10.1007/s11071-023-08728-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-023-08728-9