1 Introduction

To guarantee safe operation in production and reduce economic losses resulted from equipment failure, fault diagnosis for the rotary machines are becoming more and more important. Rolling bearing has been extensively used in the rotatory machineries, and bearing failure is one of most frequent fault sources. This vital component needs working highly reliably to reduce failure occurrences and unexpected machine breakdowns [1, 2]. Hence, it is significant to effectively inspect the occurrence of bearing fault. The vibration signal analysis method has been widely utilized to inspect and diagnose bearing faults because the vibration signals contain rich information for comprehending phenomenon associated with the working conditions of rolling bearing [3,4,5].

However, the measured vibrational signals are considered as nonlinear and non-stationary [6,7,8], which makes the fault feature extraction from these complicated signals a challenge. In view of the characteristics of nonlinearity and non-stationarity, nonlinear parameter estimation methods have been applied to extract state-related information buried in the bearing vibrations. Therein, appropriate entropy (ApEn) [9] was introduced by Yan and Gao [10] for the health monitoring of rolling bearing because of containing rich time-related information and efficient computation. Unfortunately, the calculation of ApEn is effected heavily by length of the data and the corresponding estimated value is uniformly lower than the expected one [11, 15]. To address the limitations of ApEn, Richman and Moorman [11] put forward sample entropy (SampEn), which can obtain a better performance than ApEn. Nevertheless, Heaviside function was utilized to define the similarity degree of vectors in both SampEn and ApEn, which has a rigid and discontinuous boundary [12]. Consequently, Chen et al. [12, 13] developed fuzzy entropy (FuzzyEn) by substituting the exponential function for the Heaviside function. Compared with SampEn, FuzzyEn has a better statistical stability and is more suitable to measure the signal complexity because of continuous boundaries of fuzzy functions. Hence, Zheng et al. [14] and Zhu et al. [15] used it to assess the bearing vibration signals’ complexity. However, FuzzyEn only emphasizes the local characteristics of the signal and neglects its global fluctuation [16]. Therefore, Liu et al. [16] proposed the fuzzy measure entropy (FuzzyMEn) algorithm, which can reflect not only the local but also the global characteristics of the signal. Compared with FuzzyEn, FuzzyMEn achieved better discrimination ability and was successfully utilized to analyze the heart rate variability [16]. Considering the advantage of FuzzyMEn over FuzzyEn, we introduce FuzzyMEn to the field of bearing fault diagnosis.

Nevertheless, SampEn, FuzzyMn and FuzzyMEn are all designed to measure the signal complexity in one single scale. In order to analyze the signal complexity on multiple time scales rather than on only one scale, the multi-scale analysis method was proposed by Costa et al. [17, 18]. The multi-scale entropy approach was then used for the fault diagnosis of rolling bearing [19, 20]. Similarly, the multi-scale fuzzy entropy (MFE) was developed by Zheng et al. [21]. Based on MFE, Li et al. [22] proposed an improved MFE method for extracting bearing fault features. In this paper, combining the merits of both FuzzyMEn and multi-scale analysis, we presented a multi-scale fuzzy measure entropy (MFME) method to acquire condition-related information buried in the bearing vibration signals.

Naturally, after feature extraction using MFME, an intelligent classifier is required to complete the fault recognition based on the extracted features. The classification performance is highly dependent on the quality of the used features which are input to the classifier. The fault feature set obtained from the bearing signals using MFME method is of high dimension and it contains irrelevant or redundant features, which makes the recognition process time-consuming and decrease the identification rate. Therefore, it is critical to select the most informative features to improve the classification efficiency as well as recognition accuracy [23]. Recently, an effective approach called infinite feature selection (Inf-FS) [24] was proposed to select the best features according to their importance. The most appealing characteristic of Inf-FS is that it evaluates the importance of a given feature while considering all the possible subsets of features [24]. For this reason, in this study, the Inf-FS method is adopted to choose the most representative features from the MFME features of bearing vibration signals.

Finally, after feature selection using the Inf-FS method, a multi-class classifier is exploited to automatically identify different bearing working conditions. Over the past decades, a variety of classification techniques have been used in mechanical fault diagnosis field. Among them, support vector machine (SVM) [25, 26] is the most extensively used ones. On the basis of statistical learning theory, SVM is suitable to deal with situations with small-quantity samples. At the same time, SVM has good generalization ability and can ensure the local and global optimal solution exactly the same [27]. Because it has high accuracy and good generalization ability for a small number of samples [15, 27], SVM is adopted to fulfill the fault diagnosis of rolling bearing.

To sum up, a new fault diagnosis method for rolling bearing is proposed based on MFME, Inf-FS and SVM in this paper. First, the MFME values of vibration signals of rolling bearing in various scales are calculated and treated as the original fault features. Then Inf-FS is employed to choose the most informative features from the original ones and the selected features are fed into the multi-class SVM classifier. Subsequently, the different working conditions of rolling bearings are identified by means of the outputs of the trained classifier.

2 Multi-scale Fuzzy Measure Entropy

2.1 Fuzzy Entropy

The calculation steps of FuzzyEn are as follows [12, 15].

  1. (1)

    For a time series with length \( N\{ {u(i):1 \le i \le N}\}, \) construct the m-dimensional vectors \( \varvec{X}_{i}^{m} \) as

    $$ \varvec{X}_{i}^{m} = \{ {u( i ),\;u( {i + 1}), \ldots ,u( {i + m - 1})}\} - u_{0} (i)\quad 1 \le i \le N - m + 1 $$
    (1)

    where \( \varvec{X}_{j}^{m} \) stands for a new time series, being generalized by subtracting the average of the m consecutive u values

    $$ u_{0} \left( i \right) = m^{ - 1} \mathop \sum \limits_{j = 0}^{m - 1} u\left( {i + j} \right) $$
    (2)
  2. (2)

    The distance between \( \varvec{X}_{i}^{m} \) and \( \varvec{X}_{j}^{m} \) is designated as

    $$ d_{ij}^{m} = d\left[ {X_{i}^{m} ,X_{j}^{m} } \right] = \mathop {\hbox{max} }\limits_{{k \in \left[ {0,m - 1} \right]}} \left| {\left( {u\left( {i + k} \right) - u_{0} \left( i \right)} \right) - \left. {\left( {u\left( {j + k} \right) - u_{0} \left( j \right)} \right)} \right|} \right. $$
    (3)
  3. (3)

    The similarity degree \( D_{ij}^{m} \) can be obtained by using a fuzzy function

    $$ D_{ij}^{m} = \mu \left( {d_{ij}^{m} ,r} \right) $$
    (4)
  4. (4)

    Denote \( \phi_{i}^{m} (r) \) as

    $$ \varphi_{i}^{m} \left( r \right) = \left( {N - m - 1} \right)^{ - 1} \mathop \sum \limits_{j = 1,j \ne i}^{N - m} D_{ij}^{m} $$
    (5)
  1. (5)

    The function \( \phi^{m} (r) \) is defined as

    $$ \varphi^{m} \left( r \right) = \left( {N - m} \right)^{ - 1} \mathop \sum \limits_{i = 1}^{N - m} \varphi_{i}^{m} \left( r \right) $$
    (6)
  2. (6)

    Similarly, the \( \phi^{m + 1} (r) \) is obtained by repeating the above procedures

    $$ \varphi^{m + 1} \left( r \right) = \left( {N - m} \right)^{ - 1} \mathop \sum \limits_{i = 1}^{N - m} \varphi_{i}^{m + 1} \left( r \right) $$
    (7)
  3. (7)

    Then define FuzzyEn of the sequence as

    $$ FuzzyEn\left( {m,r} \right) = \mathop {\lim }\limits_{N \to \infty } \left[ {\ln \varphi^{m} \left( r \right) - \ln \varphi^{m + 1} \left( r \right)} \right] $$
    (8)
  4. (8)

    Finally, for a N with finite length, FuzzyEn can be computed by

    $$ FuzzyEn\left( {m,r,N} \right) = \ln \varphi^{m} \left( r \right) - \ln \varphi^{m + 1} \left( r \right) $$
    (9)

The fuzzy function used in FuzzyEn was designated as

$$ \mu \left( {d,r,n} \right) = e^{{ - \left( {d/r} \right)^{n} }} $$
(10)

2.2 Fuzzy Measure Entropy

FuzzyEn subtracts the mean of the original sequence segment, which causes it to neglect the global fluctuation in the signal. The FuzzyMEn uses both the fuzzy local and global measure entropy to assess the signal complexity and thus has better discrimination ability [16]. Considering this advantage, FuzzyMEn is introduced to evaluate the bearing vibration signals’ complexity in this paper. The original FuzzyEn is named as the fuzzy local measure entropy (FuzzyLEn). The difference between fuzzy global measure entropy (FuzzyGEn) and FuzzyLEn is the constitution of the vectors of m dimension [16], which is defined as

$$ X_{i}^{m} = \left\{ {u\left( i \right),\;u\left( {i + 1} \right), \ldots ,u\left( {i + m - 1} \right)} \right\} - \bar{u}_{0} \left( i \right)\;\;\;1 \le i \le N - m + 1 $$
(11)

where \( \bar{u}_{0} \left( i \right) \) is the average of the entire sequence

$$ \bar{u}_{0} \left( i \right) = N^{ - 1} \mathop \sum \limits_{i = 0}^{N - 1} u\left( i \right) $$
(12)

The other calculation steps of FuzzyGEn are the same as FuzzyLEn. Then, the FuzzyMEn of the sequence is defined as

$$ FuzzyMEn = FuzzyLEn + FuzzyGEn $$
(13)

2.3 Multi-scale Fuzzy Measure Entropy

The multi-scale fuzzy measure entropy (MFME) method was proposed in this paper. The steps of MFME method are listed as follows.

  1. (1)

    Given the original sequence \( \left\{ {{\text{u}}\left( {\text{i}} \right):1 \ll {\text{i}} \ll {\text{N}}} \right\}, \) predetermine the embedding dimension m and the similar tolerance r. Then, construct the consecutive coarse-grained time series \( y_{j}^{\left( \tau \right)} \) based on the equation

    $$ y_{j}^{(\tau )} = 1/\tau \mathop \sum \limits_{{i = \left( {j - 1} \right)\tau + 1}}^{j\tau } u(i), \;\;\; 1 \ll j \ll \frac{N}{\tau } $$
    (14)

    where τ represents the scale factor.

  2. (2)

    For the same r, compute FuzzyMEn value for each coarse-grained time series. The FuzzyMEn values in different scales can be presented as the function of scale factors, and this process is called MFME analysis.

2.4 Parameter Selection

Before the computation of MFME, five parameters need to be chosen, i.e., m, r, N, n, and τ, respectively. The detailed reconstruction of the dynamic process requires large embedding dimension m. However, a too large m will result in information loss and it needs a very large N (10m–30m). Generally, m is set to 2. N is fixed to 2048 because the computation of entropy values is less dependent on data length. The width of the fuzzy function boundary is determined by the parameter r while the boundary gradient is determined by the parameter n. Based on the previous researches [9, 12], r is in the range of 0.1–0.25 multiplied by standard deviation (SD) and small integers be assigned to n. Here, r = 0.2SD is fixed while n = 2 is selected. Finally, set the max scale factor τ in MFME as 20.

3 Infinite Feature Selection

In this study, the feature vectors are comprised of the fault features obtained using MFME in 20 scales. However, some of these extracted features may be relevant or redundant. Meanwhile, the high feature dimensionality will increase the time consumption and lower the classification accuracy. Hence, it is necessary to construct the low-dimensional feature vectors through selecting the most discriminating features. The merit of Inf-FS lies that it assesses the importance of a given feature while taking into account all possible subsets of the features [24]. Therefore, Inf-FS is used to complete the bearing fault feature selection.

Given a set of features \( F = \left\{ {f^{\left( 1 \right)} , \ldots ,f^{\left( n \right)} } \right\}, \) an graph G that is indirected fully connected can be built, in which each vertex corresponds to a feature and edges model pairwise relations among features. Let an adjacency matrix A representing G, the nature of the edges can be specified [24]: each element \( a_{ij} \) of A represents a pairwise energy term defined as

$$ a_{ij} = \alpha \hbox{max} \left( {\sigma^{\left( i \right)} ,\sigma^{\left( j \right)} } \right) + \left( {1 - \alpha } \right)\left( {1 - \left| {Spearman\left( {f^{\left( i \right)} ,f^{\left( j \right)} } \right)} \right|} \right) $$
(15)

where α is a loading coefficient \( \in \left[ {0,1} \right], \) σ(i) is the standard deviation over the samples \( \left\{ x \right\} \in f^{\left( i \right)} , \) and Spearman indicates Spearman’s rank correlation coefficient [24]. A high pairwise energy shows that at least one feature among f(i) and f(j) is discriminative and there is a low correlation degree between them.

Let a finite path between vertices i and j be denoted as \( \gamma = \left\{ {v_{0} = i,v_{1} , \ldots ,v_{l - 1} ,v_{l} = j} \right\}, \) which is simply a subset of the feature pairs along the path. Then, define the energy of γ as

$$ \varepsilon_{\gamma } = \mathop \prod \limits_{k = 0}^{l - 1} a_{{v_{k} ,v_{k + 1} }} $$
(16)

By expanding length of the path to infinity, the energy of the ith feature can be calculated

$$ s\left( i \right) = \mathop \sum \limits_{l = 1}^{\infty } \mathop \sum \limits_{j\smallint V} \left( {\mathop \sum \limits_{{\gamma \in P_{i,j}^{l} }} \varepsilon_{\gamma } } \right) = \mathop \sum \limits_{l = 1}^{\infty } \mathop \sum \limits_{j \in V} A^{l} \left( {i,j} \right) $$
(17)

where V designates the vertices set, and \( P_{i,j}^{l} \) contains all the paths of length l between i and j [24]. However, the sum of infinite Al terms may lead to divergence. To guarantee the convergence of the infinite sum, a real-valued regularization factor is needed as follows

$$\check{s} \left( i \right) = \mathop \sum \limits_{l = 1}^{\infty } \mathop \sum \limits_{j \in V} r^{l} A^{l} \left( {i,j} \right) $$
(18)

Consequently, by using the convergence property of the geometric power series of a matrix [24], \(\check{s} \left( i \right) \) can be efficiently calculated

$$ \check{S}= (I - rA)^{ - 1} - I $$
(19)

For each feature, the final energy scores can be obtained by marginalization of the quantity

$$ \check{s}\left( i \right) = [\hat{S}e]_{i} $$
(20)

It should be noted that the higher the final energy score, the more discriminative the feature, namely, the more important the feature. Hence, a rank for the selected features can be obtained by sorting the \(\check{s} \left( i \right) \) energy scores in descending order. In this study, Inf-FS is utilized to choose the most significant features from the original feature set. The parameter α is initially set as 0.5 and its influence on the selection performance is investigated.

4 The Proposed Bearing Fault Diagnosis Approach

By combining the merits of MFME, Inf-FS and SVM, a new fault diagnosis approach for rolling bearing is proposed as follows.

  1. (1)

    The vibration signals of rolling bearing under different running states are acquired by using an accelerometer.

  2. (2)

    The MMFE values for each bearing vibration signal are calculated over 20 scales, and then the 20 features are used to form the original feature vector.

  3. (3)

    Inf-FS is utilized to rank the primary 20 features based on their importance. The first few features are selected as the most discriminative features and treated as the new feature vector of low dimension.

  4. (4)

    The new low-dimensional feature subsets are input into the multi-class SVM classifier and the rolling bearing fault diagnosis is fulfilled automatically.

5 Experimental Verification

5.1 Experimental Data

The vibration data obtaining from bearing data center of Case Western Reserve University [28] are used in this paper. The description of the data acquisition apparatus can be found in detail in the aforementioned literature. The bearing vibration data under the 0 horsepower load with the corresponding speed of 1797 rpm were used for analysis. The data description is presented in detail in Table 1.

Table 1 Experimental data description

5.2 Results and Analysis

To demonstrate the effectiveness of the proposed bearing fault diagnosis method, experimental analyses are carried out. In view of various fault types and fault diameters, the bearing fault diagnosis is actually a seven-class identification problem. The data set contains totally 350 data samples. 20 samples for each bearing state, a total of 140 samples are selected randomly as training data while the rest 210 are treated as testing data.

The temporal waveforms of bearing vibration signals are given in Fig. 1. As can be seen, it is difficult to identify various bearing states accurately based only on the time domain waveform, so it is necessary to further process the raw vibration signals. Since MFE focused only on the signals’ local characteristics and neglected the global trend in it, it may exhibit its limitation in bearing fault feature extraction. Based on this consideration, the MFME is introduced to acquire more condition-related information from bearing vibration signals.

Fig. 1
figure 1

Temporal waveforms of bearing vibration signals

The MFME values over 20 scales corresponding to the vibration data illustrated in Fig. 1 are given in Fig. 2. It can be observed that, for most scales, the largest entropy values of vibration signals appear when the bearing runs under healthy state, which shows that the healthy signals are more complex than the faulty ones. This phenomenon could be interpreted in the following way [21]. The vibration over most scales has self-similarity, and is random and irregular when the rolling bearing runs under good state. On the contrary, the self-similarity and regularity of the vibration signals will arise when bearing runs with fault, leading to lower entropy values.

Fig. 2
figure 2

MFME of vibration signals over 20 scales under seven bearing conditions

However, if all the MFME values of 20 scales are employed to form the feature vector, the redundant fault information will be inevitably contained in the high-dimensional fault feature set. Meanwhile, features of high dimension will make the process of fault diagnosis time-consuming. For this reason, in order for dimension reduction, Inf-FS is introduced to select the most representative features by ranking the original 20 MFME values. Based on Inf-FS, the rearranged order of the 20 features is: 4, 17, 3, 18, 16, 8, 7, 5, 19, 15, 20, 14, 13, 2, 6, 11, 12, 9, 10 and 1. The new order of the features over different scales is presented in Fig. 3.

Fig. 3
figure 3

New order of MFME features ranked by Inf-FS

After the feature selection, the selected features of training samples are utilized to form the new feature vectors and train the multi-class SVM classifier, where the kernel of radial basis function (RBF) is adopted because of its merits [29]. The optimal parameters in RBF-SVM are obtained by means of the cross-validation approach. After training, the newly selected features of test data are fed into the trained classifier, and the recognition results of test data versus the number of selected features are given in Table 2. As is presented in Table 2, the classification accuracy can achieve 100% when the first 5 to 20 features reordered by Inf-FS are used. In view of computational complexity and recognition performance, the first 5 rearranged features are treated as the discriminative features for the fault diagnosis of rolling bearing.

Table 2 Recognition results of testing data based on MFME and Inf-FS versus different number of features

For comparison purpose, MFE is also used for fault feature extraction from the same vibration data and Inf-FS is applied for feature selection. The new order of MFE according to Inf-FS is: 4, 18, 17, 19, 20, 16, 15, 14, 8, 13, 5, 3, 7, 12, 11, 6, 9, 10, 2 and 1. The classification results using MFE features are shown in Table 3. It can be observed that the identification rate achieves 100% when the number of features reaches 19. Compared with five features of MFME, this will increase the complexity of computation and make the training process time-consuming. If the same number of features (five features) as MFME is used for fault pattern recognition, four samples are misclassified and the classification accuracy is 98.10%, which is below 100%. In order to show the comparison more clearly between MFE and MFME based methods, the accuracy comparison versus the number of selected features is illustrated in Fig. 4. This comparison indicates that classification performance of MFME is superior to that of MFE, which may mean that MFME can obtain more fault-related information from the vibration signals of rolling bearing than MFE.

Table 3 Recognition results of testing data based on MFE and Inf-FS versus different number of features
Fig. 4
figure 4

Recognition rate comparison of MFE and MFME versus number of features

The coefficient α in Inf-FS is a constant in the range of 0 and 1. Generally, the selection of α values will have an effect on the identification performance. Therefore, the influence of various α values on the classification results is investigated. First, the feature selection results (MFME and MFE) using Inf-FS with varying α values are described in Table 4. Then, classification results based on MFME and Inf-FS with different α values are presented in Table 5, from which we can see that the recognition rate can reach 100% using five sensitive features in most cases. The accuracy of 100% can be obtained using at most 10 features. Correspondingly, the identification results based on MFE and Inf-FS with various α values are shown in Table 6, from which we can see that for an identification rate of 100%, the least number of used features is 9 when α equals 0.2. The above analysis further indicates the superiority of MFME over MFE.

Table 4 Selection results using Inf-FS versus different α values
Table 5 Classification results of testing data based on MFME and Inf-FS with different number of features and varying α values
Table 6 Classification results of testing data based on MFE and Inf-FS with different number of features and varying α values

To further validate the effectiveness of Inf-FS, another feature selection method named Laplacian Score [30] is used for comparison, which was recently applied for bearing fault feature selection [21, 22]. For the same features obtained using MFME, new order of the features sorted by LS is: 4, 19, 16, 14, 18, 17, 8, 3, 20, 7, 2, 15, 13, 12, 5, 11, 9, 1, 10 and 6. Classification results based on LS and Inf-FS are both illustrated in Fig. 5. As can be seen, the Inf-FS based identification rates are higher than or equal to those based on LS in most cases except when the selected feature number is 2. The LS based method can achieve an accuracy of 100% as the dimension of the selected features is 8, which is larger than 5 when using Inf-FS. The above analysis indicates that in contrast with LS, the Inf-FS can select more discriminative features from the primary feature set, and the Inf-FS based bearing fault diagnosis method can acquire better recognition performance.

Fig. 5
figure 5

Recognition results comparison of Inf-FS and LS versus number of features

Furthermore, to further demonstrate the necessity of feature selection with Inf-FS, the randomly selected 5 features are used for fault pattern recognition. The two groups of features selected at random are MFME values at scales 4, 8, 12, 15 and 18, and MFME values at scales 1, 3, 5, 7 and 9. The corresponding classification accuracies are 97.14% and 99.52%, which are both lower than 100% when using Inf-FS for feature selection. This comparison results show the necessity and effectiveness of Inf-FS from another point of view.

Finally, in order to assess the effects of parameters N and m, the experimental analysis was carried out with the same data set. The effect of parameter N on the MFME values is shown in Tables 7, 8, 9, 10, 11, 12 and 13 (considering the space limitation, only the first eight scales of entropy values are presented), which correspond to the aforementioned seven bearing conditions, respectively. From Tables 7, 8, 9, 10, 11, 12 and 13, it can be observed that the MFME values are stable for all the seven states with N = 1024, 2048, 4096 and 8192, as indicated by the corresponding standard deviations. Such results are consistent with the discussion given in [12, 13], which is that the calculation of fuzzy entropy values depend less on the data length.

Table 7 The MFME values of first 8 scales with different N values under normal condition
Table 8 The MFME values of first 8 scales with different N values under IRF1 condition
Table 9 The MFME values of first 8 scales with different N values under IRF2 condition
Table 10 The MFME values of first 8 scales with different N values under ORF1 condition
Table 11 The MFME values of first 8 scales with different N values under ORF2 condition
Table 12 The MFME values of first 8 scales with different N values under BF1condition
Table 13 The MFME values of first 8 scales with different N values under BF2 condition

Subsequently, the effect of parameter m on the experimental results was investigated with N = 2048. Except for the above discussed result with m = 2 (Fig. 4), the classification results with m = 1, 3 and 4 are illustrated in Figs. 6, 7 and 8, respectively. From Figs. 6, 7 and 8, it can be seen that the MFME-based classification rates are higher than or equal to those based on MFE in most cases. These results further validate the ability of MFME for bearing feature extraction.

Fig. 6
figure 6

Recognition rate comparison of MFE and MFME versus number of features (m = 1)

Fig. 7
figure 7

Recognition rate comparison of MFE and MFME versus number of features (m = 3)

Fig. 8
figure 8

Recognition rate comparison of MFE and MFME versus number of features (m = 4)

6 Conclusion

A new fault diagnosis approach of rolling bearing is put forward by combining MFME, Inf-FS and SVM. The MFME algorithm is proposed for the fault feature extraction of rolling bearing, which is used to form the original feature vectors of high dimension. To reduce feature dimension and improve classification accuracy, the Inf-FS method is adopted to choose the most discriminative features. The selected features construct the low-dimensional feature vectors and then are applied for fault pattern recognition. For comparison purpose, MFE is also applied to the same vibration data, and the comparison results demonstrate that MFME can acquire more state-related information hidden in the vibration signals and achieve a better recognition performance. In addition, the necessity and effectiveness of Inf-FS is verified through experimental analysis, and the comparison with LS shows the superiority of Inf-FS for feature selection. Finally, the experimental analysis demonstrates that the proposed diagnosis method can effectively identify different working conditions of rolling bearings.