1 Introduction

Rolling bearing is an important component of rotation machinery, its operation directly affects the working condition of the whole mechanical equipment. The bearing failure will cause huge security risks in the manufacturing process. Therefore, it has a very import significant for on-line monitoring and fault-diagnosis of rolling bearings [2, 28]. This is why fault diagnosis of rolling bearing becomes a research focus, and many the vibration analysis methodologies have been proposed. When a rolling bearing fails, the collision occurs between the faulty part and other components, and non-stationary, non-linear shock signals can be obtained from the sensor installed on the device. This is also the basic principle of these analysis methodologies.

Most methods include two typical steps: feature extraction and selection, condition classification. In the first step, time domain, frequency domain, and time-frequency domain analysis are often applied [21]. The extracted time domain features like peak-to-peak value, root mean square value, kurtosis indicator, etc. obtained from the raw signals can be used, but some information are not easily observed. Frequency domain analysis could solve this problem which conduct FFT on the raw vibration signal, then analyzing the power spectrum, kurtosis spectrum, order cepstrum, envelope spectrum, etc. for diagnosis [5, 11, 32]. However, frequency domain analysis has limited analytical capabilities for non-stationary signals and this is why time-frequency domain analysis have been used for non-stationary signals diagnosis [8]. No matter which domain features are adopted, most of the features extracted are redundant. It is need to choose the typical information used the method like PCA [36], Kullback Leibler (K–L) divergence [3], distance evaluation technique [18], feature discriminant analysis, and compressed sensing [1, 12].

In addition, there are other transform domain analysis which are proved to be effective. Especially, Empirical Mode Decomposition (EMD) and Local Mean Decomposition (LMD) are widely used in feature extraction [22, 23, 31, 41], but the decomposition error is lager and decomposition result is susceptible to sampling frequency in these two feature extraction methods. Because EMD and LMD both are the recursive modal decomposition, modal aliasing is existed which make it difficult to separate the components with similar frequency, and the end effect has also appeared. Compared with EMD and LMD, VMD shows great advantages in bearing fault diagnosis [37, 42] . VMD determines the frequency center and bandwidth of each decomposition mode by iteratively searching for the optimal model. It is the non-recursive and variational modal decomposition, which could avoid modal aliasing and successfully separate two pure harmonic signals with similar frequencies. It has the characteristics of high precision and fast convergence and shows good robustness to noise. Therefore, VMD is used in this paper, and the Tsallis entropy of multiple modalities are calculated as the signal features.

In the second step, some artifificial intelligence algorithms have been proposed, like support vector machine(SVM) [38], artificial neural network(ANN) [16], random forest [40], etc. In addition, there are some algorithms constructed based on the objects [43]. Most of the algorithms can be used for rolling bearing diagnosis, but they are heavily dependent on the features extraction and much signal prior knowledge.

With the development of deep learning, the concept of building a network is applied in many aspects, which includes Image-Text Matching [10, 15], Social Multimedia [13, 14, 35], fault diagnosis [6, 7, 39]. At the same time, there are many deep learning-based bearing diagnosis methods proposed [29, 30]. Especially CNN has excellent feature extraction ability. Attention mechanism is introduced which could assist the deep network to extract the discriminative features and visualize the learned diagnosis knowledge effectively under the condition that there is only a small data set [19]. In addition, The multi-layers is also utilized as the main architecture of the fault diagnosis, and is proved efficient [20]. However, most of method based on deep learning have a long training time for the learning model, and can’t check run process which is not flexible. Therefore, two steps: feature extraction and selection, condition classification are adopted considering the actual operation of bearing signals. FCM, which has been proved convergent [24,25,26,27], is used for classification after the feature extraction .

Specifically, the method proposed in this paper is: 1) adopt VMD to decompose the obtained raw signal. 2) The obtained Tsallis entropy after signals decomposition with VMD are used as the signal features considering Tsallis entropy can solve the non-extensive problem of the system. 3) Fuzzy c-means clustering algorithm (FCM) is applied for better diagnosis. Next, the remainder of this paper is organized as follows. In Section 2, the feature extraction is presented. Then the FCM is described in Section 3. The Overall framework is presented in Section 4. The experimental results are shown and discussed in Section 5, and the conclusion is then presented in Section 6.

2 Feature extraction

In the feature extraction, the obtained vibration signal is decomposed with VMD firstly to get a series of band-limited intrinsic modal function (BIMF). Then, the Tsallis entropy of BIMF components were calculated and used as the signal features. Specially, the VMD and Tsallis entropy calculation are described as follows.

2.1 VMD

VMD is a non-recursive decomposition method, which can decompose multi-component signals of complex signals into amplitude-frequency modulation (AM-FM) component signals. The basic process of this algorithm are: assuming that each eigenmode function has a limited bandwidth with different center frequencies firstly, then the variational problem is solved by conversion, and each eigenmode function is demodulated to Corresponding base frequency band in order to minimize the sum of the estimated bandwidth of each eigenmode function, finally extract each eigenmode function and corresponding center frequency.

Decompose a real signal f(t) into K sparse and independent sub-signals, its AM and FM signal form can be defined as:

$$ {u}_k(t)={A}_k(t)\cos \left[{\varphi}_k(t)\right] $$
(1)

uk(t) is the K IMF components obtained by VMD decomposition of signal f(t), {uk(t)} = {u1(t), u2(t), ⋯, uK(t)}, (k = 1, 2, ⋯K). φk(t) is a non-monotonically decreasing phase function and \( {\varphi}_k^{\prime }(t)\ge 0 \),Ak(t) is the instantaneous amplitude of uk(t) (envelope) which satisfiesAk(t) ≥ 0.

The instantaneous frequency of uk(t) is

$$ {\omega}_k(t)={\varphi}_k^{\prime }(t)=\frac{d{\varphi}_k(t)}{d(t)} $$
(2)

Obviously, Ak(t) and ωk(t) are gradually changing relative to φk(t), that is, within the interval of [t − δ, t + δ] (where δ = 2π/φ(t)), uk(t) can be regarded as a harmonic signal with amplitude Ak(t) and frequency ωk(t).

Here, assume that each mode of the signal has a limited bandwidth with a center frequency, variational problems can be described as seeking k modal functions uk(t) so that the sum of the estimated bandwidth of each mode is the smallest and the constraint is the sum of each mode is the original input signal f(t).

Specifically, the analytical signal of each modal function uk(t) is obtained through the Hilbert transform, and then its unilateral frequency spectrum can be obtained:

$$ \left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_k(t) $$
(3)

Where n ← 0 is a unit pulse function, j is an imaginary unit, and * is convolution.

Then the analysis signal of each mode is added an estimated center frequency dij = ‖xj − vi‖, the spectrum of each mode is modulated to the corresponding base band:

$$ \left[\left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_k(t)\right]{e}^{-j{\omega}_kt} $$
(4)

Where {ωk} = {ω1, ω2, ⋯, ωK}, (k = 1, 2, ⋯K) is the center frequency of each {uk(t)}.

Calculate the squared L2 norm of the demodulated signal gradient to estimate the bandwidth of each modal signal. The variation problem is expressed as follows:

$$ {\displaystyle \begin{array}{l}\underset{\left\{{u}_k\right\}\cdot \left\{{\omega}_k\right\}}{\min}\left\{\sum \limits_{k=1}^K{\left\Vert {\partial}_t\left[\left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_k(t)\right]{e}^{-j{\omega}_kt}\right\Vert}_2^2\right\},\\ {}s.t.\kern0.5em \sum \limits_{k=1}^K{u}_k(t)=f(t)\end{array}} $$
(5)

To find the optimal solution of the above constrained variational model, transform the constrained variational problem to be solved into a non-constrained variational problem by introducing quadratic penalty factor and Lagrange operator. And the extended Lagrangian function is:

$$ \underset{u_k\cdot {\omega}_k}{L\left(\left\{{u}_k\right\},\left\{{\omega}_k\right\},\lambda \right)=\alpha \sum \limits_{k=1}^K{\left\Vert {\partial}_t\left[\left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_k(t)\right]{e}^{-j{\omega}_kt}\right\Vert}_2^2}+{\left\Vert f(t)-\sum \limits_{k=1}^K{u}_k(t)\right\Vert}_2^2+\left\langle \lambda (t),f(t)-\sum \limits_{k=1}^K{u}_k(t)\right\rangle $$
(6)

Where α is the second penalty factor, it guarantees the reconstruction accuracy of the signal in the presence of Gaussian noise, λ(t) is Lagrange operator and keeps the constraints strictly, 〈⋅, ⋅〉 represents inner product.

Next, Alternate Direction Method of Multipliers(ADMM) is adopted. Seek the “saddle point” of the Lagrange expression by alternately updating \( {u}_k^{n+1} \), \( {\omega}_k^{n+1} \), andλn + 1.

The problem of solving \( {u}_k^{n+1} \) can be expressed as:

$$ {u}_k^{n+1}=\underset{u_k\in X}{\arg \min}\left\{\alpha {\left\Vert {\partial}_t\left[\left(\delta (t)+\frac{j}{\pi t}\right)\ast {u}_k(t)\right]{e}^{-j{\omega}_kt}\right\Vert}_2^2+{\left\Vert f(t)-\sum \limits_{i=1}^K{u}_i(t)+\frac{\lambda (t)}{2}\right\Vert}_2^2\right\} $$
(7)

Where X is the solution space of uk. Using the Parseval/Plancherel Fourier isometric method to solve this problem in the frequency domain.

$$ {\overset{\frown }{u}}_k^{n+1}\left(\omega \right)=\underset{{\overset{\frown }{u}}_k,{u}_k\in X}{\arg \min}\left\{\alpha {\left\Vert j\omega \left[\left(1+\operatorname{sgn}\left(\omega +{\omega}_k\right)\right){\overset{\frown }{u}}_k\left(\omega +{\omega}_k\right)\right]\right\Vert}_2^2+{\left\Vert \overset{\frown }{f}\left(\omega \right)-\sum \limits_{i=1}^K{\overset{\frown }{u}}_i\left(\omega \right)+\frac{\overset{\frown }{\lambda}\left(\omega \right)}{2}\right\Vert}_2^2\right\} $$
(8)

Where \( \overset{\frown }{u} \), \( \overset{\frown }{f} \), \( \overset{\frown }{\lambda } \) is the Fourier transform of the corresponding time signal respectively. In the first term of Eq. (8), the variableω ← ω ‐ ωk.

$$ {\overset{\frown }{u}}_k^{n+1}\left(\omega \right)=\underset{{\overset{\frown }{u}}_k,{u}_k\in X}{\arg \min}\left\{\alpha {\left\Vert j\left(\omega \hbox{-} {\omega}_k\right)\left[\left(1+\operatorname{sgn}\left(\omega \right)\right){\overset{\frown }{u}}_k\left(\omega \right)\right]\right\Vert}_2^2+{\left\Vert \overset{\frown }{f}\left(\omega \right)-\sum \limits_{i=1}^K{\overset{\frown }{u}}_i\left(\omega \right)+\frac{\overset{\frown }{\lambda}\left(\omega \right)}{2}\right\Vert}_2^2\right\} $$
(9)

Using Hermitian symmetry of the real signal in the reconstructed fidelity term, these two terms can be written as half-space integrals at non-negative frequencies.

$$ {\overset{\frown }{u}}_k^{n+1}\left(\omega \right)=\underset{{\overset{\frown }{u}}_k,{u}_k\in X}{\arg \min}\left\{{\int}_0^{\infty}\left[4\alpha {\left(\omega \hbox{-} {\omega}_k\right)}^2{\left|{\overset{\frown }{u}}_k\left(\omega \right)\right|}^2+2{\left|\overset{\frown }{f}\left(\omega \right)-\sum \limits_{i=1}^K{\overset{\frown }{u}}_i\left(\omega \right)+\frac{\overset{\frown }{\lambda}\left(\omega \right)}{2}\right|}^2\right]\mathrm{d}\omega \right\} $$
(10)

The solution to this quadratic optimization problem is:

$$ {\overset{\frown }{u}}_k^{n+1}\left(\omega \right)=\frac{\overset{\frown }{f}\left(\omega \right)-{\sum}_{i\ne k}{\overset{\frown }{u}}_i\left(\omega \right)+\frac{\overset{\frown }{\lambda}\left(\omega \right)}{2}}{1+2\alpha {\left(\omega \hbox{-} {\omega}_k\right)}^2} $$
(11)

In addition, because the center frequency ωk only appears in the low-frequency bandwidth. It can be expressed:

$$ {\omega}_k^{n+1}=\underset{\omega_k}{\arg \min}\left\{{\left\Vert {\partial}_t\left[\left(\delta (t)\frac{j}{\pi t}\right)\ast {u}_k(t)\right]{e}^{-{jw}_kt}\right\Vert}_2^2\right\} $$
(12)

And get

$$ {\omega}_k^{n+1}=\frac{\int_0^{\infty}\omega {\left|{\overset{\frown }{u}}_k\left(\omega \right)\right|}^2\mathrm{d}\omega }{\int_0^{\infty }{\left|{\overset{\frown }{u}}_k\left(\omega \right)\right|}^2\mathrm{d}\omega } $$
(13)

Obviously, ωk is at the center of gravity of the corresponding modal power spectrum.

Iterate \( {\overset{\frown }{u}}_k\left(\omega \right) \) and ωk using Eqs. 11 and 14 to get the optimal solution. Generally, the termination criterion of iteration number n satisfies:

$$ \frac{\sum \limits_{k=1}^K{\left\Vert {\overset{\frown }{u}}_k^{n-1}-{\overset{\frown }{u}}_k^n\right\Vert}_2^2}{\sum \limits_{k=1}^K{\left\Vert {\overset{\frown }{u}}_k^n\right\Vert}_2^2}<e $$
(14)

Where e(e > 0) is the convergence constraint of fixed precision.

According to the above description, the specific process of VMD algorithm is as follows

  1. (1)

    Initialize \( \left\{{\overset{\frown }{u}}_k^1\left(\omega \right)\right\} \), \( \left\{{\omega}_k^1\right\} \), \( {\overset{\frown }{\lambda}}^1\left(\omega \right) \), \( \hat{f} \);

  2. (2)

    Repeat

\( \hat{\lambda}\left(\omega \right) \) for \( {\hat{\lambda}}^{n+1}\left(\omega \right)\leftarrow {\hat{\lambda}}^n\left(\omega \right)+\gamma \left(\hat{f}\left(\omega \right)-\sum \limits_{k=1}^K{\hat{u}}_k^{n+1}\left(\omega \right)\right) \) do

  1. (1)

    Update γ for all ω ≥ 0:

$$ {\overset{\frown }{u}}_k^{n+1}\left(\omega \right)\leftarrow \frac{\overset{\frown }{f}\left(\omega \right)-\sum \limits_{i<k}{\overset{\frown }{u}}_i^{n+1}\left(\omega \right)-\sum \limits_{i>k}{\overset{\frown }{u}}_i^n\left(\omega \right)+\frac{{\overset{\frown }{\lambda}}^n\left(\omega \right)}{2}}{1+2\alpha {\left(\omega -{\omega}_k^n\right)}^2} $$
(15)

The quadratic penalty factor α improves convergence. Especially when the signal contains noise, the Lagrange multiplier using the quadratic penalty function effectively approximates the precise reconstruction of the signal.

  1. (2)

    Update\( \sum \limits_{k=1}^K\left({\left\Vert {\hat{u}}_k^{n+1}-{\hat{u}}_k^n\right\Vert}_2^2/{\left\Vert {\hat{u}}_k^n\right\Vert}_2^2\right)<e \):

$$ {\omega}_k^{n+1}\leftarrow \frac{\int_0^{\infty}\omega {\left|{\overset{\frown }{u}}_k^{n+1}\left(\omega \right)\right|}^2 d\omega}{\int_0^{\infty }{\left|{\overset{\frown }{u}}_k^{n+1}\left(\omega \right)\right|}^2 d\omega} $$
(16)

Where \( {\hat{u}}_k \) is the modal function in the frequency domain; \( \hat{\lambda} \) represents the Lagrange multiplier operator in frequency domain and plays a mandatory role; \( \hat{f} \) represents the original signal in frequency domain.

  1. (3)

    Dual ascent for all ω ≥ 0:

$$ {\overset{\frown }{\lambda}}^{n+1}\left(\omega \right)\leftarrow {\overset{\frown }{\lambda}}^n\left(\omega \right)+\gamma \left(\overset{\frown }{f}\left(\omega \right)-\sum \limits_{k=1}^K{\overset{\frown }{u}}_k^{n+1}\left(\omega \right)\right) $$
(17)

In this formula, γ represents noise tolerance coefficient. To achieve good de-noising effect, it can be set: γ = 0.

until convergence:\( \sum \limits_{k=1}^K\left({\left\Vert {\overset{\frown }{u}}_k^{n+1}-{\overset{\frown }{u}}_k^n\right\Vert}_2^2/{\left\Vert {\overset{\frown }{u}}_k^n\right\Vert}_2^2\right)<e \). At the end of the iteration, K components are output.

It is need to be noted that the decomposition layers K has an effect on the decomposition results, the specific impact has been explained in Reference [17] and the optimal number of decomposition layers for rolling bearing diagnosis has been proved in the experiment of this paper.

2.2 Tsallis entropy

The concept of Shannon entropy was first proposed by American scholar C.E. Shannon in 1948. The theory states that if an event has multiple possible outcomes and the probability of each outcome is pi(i = 1, 2, ⋯, N), the information obtained by a certain result can be expressed by Ii = logα(1/pi), and the information entropy defined for time series is

$$ {S}_{BG}^{(d)}=-k\sum \limits_{i=1}^N{p}_i\ln {p}_i $$
(18)

Where k = 1. Obviously, Shannon entropy is based on thermodynamic B-G entropy, and it is extensive.

Tsallis entropy introduces non-extensive parameter q on the basis of Shannon entropy and constructs a new form of entropy function. It can be expressed:

$$ {S}_q^{(d)}=\frac{k}{q-1}\left(1\hbox{-} \int f{(x)}^q\mathrm{d}x\right),q\in R $$
(19)

Where f(x)is the probability density distribution function which satisfies ∫f(x)dx = 1, and q is the non-extensive parameter.

In addition, Tsallis entropy can be expressed discretely:

$$ {S}_q^{(d)}=\frac{k}{q-1}\left(1-\sum \limits_{i=1}^n\left({p}_i^q\right)\right),q\in R $$
(20)

Where pi is the probability density distribution function of random variables i,k is a constant. In this paper k = 1,\( \sum \limits_{i=1}^n\left({p}_i^q\right)=1 \).

The selection of the non-extensive coefficient q of different tested systems has a great significance to the calculation of Tsallis entropy, can describe the non-extensive degree of the test system, and make system entropy meets the following pseudo-additivity:

$$ \frac{S_s\left(A+B\right)}{k}=\frac{S_s(A)}{k}+\frac{S_s(B)}{k}+\left(1-q\right)\frac{S_s(A){S}_s(B)}{k^2} $$
(21)

Therefore, makes information measurement more targeted and flexible. q < 1 and q > 1 denote the system’s specific super-extendability and sub-extensibility, respectively. Especially q → 1, Tsallis entropy is equivalent to Shannon entropy which be proved in below formula. Therefore, Tsallis entropy which is the extension of Shannon entropy also can describe systems with extensive characteristics, and it is often used in the analysis of random complex signals.

$$ \lim {S}_q^{(d)}=\underset{q\to 1}{\lim}\frac{k}{q-1}\left(1-\sum \limits_{i=1}^np{(i)}^q\right)=\underset{q\to 1}{\lim}\frac{k}{q-1}\left(\sum \limits_{i=1}^np(i)\left(1-p{(i)}^{q-1}\right)\right)=-k\sum \limits_{i=1}^np(i)\ln p(i)={S}_{BG}^d $$
(22)

In this paper, Tsallis entropy is suitable due to the randomness of vibration signal from the rolling bearing fault. After the vibration signal is decomposed by VMD, k eigenmode functions are obtained. Then choose the appropriate non-extended parameter q to calculate the Tsallis entropy of each eigenmode function. The features of the fault information of the signal can be distinguished according to the change of entropy value [9, 33, 34].

3 FCM

FCM algorithm is a kind of partition-based clustering algorithm. It is an improvement of the classic C-means algorithm. The principle of FCM algorithm is to maximize the similarity between the objects that are divided into the same cluster and minimize the similarity between the objects of different cluster. In this process, it is need to minimize the Euclidean distance between all data points and each cluster center, and the weighted sum of fuzzy membership firstly, then correct the fuzzy classification matrices and cluster centers continuously until the convergence constraints for a given precision are met. Lastly, clustering the data points with similarity.

Assume the sample set is X = {x1, x2, ⋯xn}, where n is the number of samples. The cluster center vector V = [v1, v2, ⋯, vc]T, where c is the number of cluster centers. The fuzzy classification matrix is U = [uij]c × n, where uij is the membership degree of the data point xjrelative to the cluster center vi. The clustering objective function is

$$ {J}_{fcm}\left(U,V\right)=\sum \limits_{j=1}^n\sum \limits_{i=1}^c{u}_{ij}^m{d}_{ij}^2 $$
(23)

Where dij is the Euclidean distance from the data point xj to the cluster center vi, it can be expressed as dij = ‖xj − vi‖. The parameter m is a fuzzy weighted index, generally m = 2. In addition, introducing the following constraints in FCM algorithm so as to find the smallest partition of the objective function though calculating U and V iteratively under the constraints.

$$ \left\{\begin{array}{c}0\le {u}_{ij}\le 1\\ {}\sum \limits_{i=1}^c{u}_{ij}=1\\ {}\sum \limits_{j=1}^n{u}_{ij}>0\end{array}\right.\kern1.00em 1\le i\le c,\kern0.5em 1\le j\le n $$
(24)

Specific steps:

  1. 1)

    Set the number of cluster centers c, precisionε(ε > 0) and the fuzzy weighted index m, initialize the fuzzy classification matrix, and set the iteration number l = 0.

(Update)vi

$$ {v}_i=\sum \limits_{j=1}^n{u}_{ij}^m{x}_j/\sum \limits_{j=1}^n{u}_{ij}^m $$
(25)

(Update)U

$$ {u}_{ij}=1/\sum \limits_{k=1}^c{\left(\frac{d_{ij}}{d_{kj}}\right)}^{2/\left(m-1\right)} $$
(26)
  1. 2)

    Determine whether U satisfies the constraint:

$$ \left\Vert {U}^{l+1}-{U}^l\right\Vert <\varepsilon $$
(27)

If the constraint is satisfied, stop iteration, otherwise repeat the step (2) and (3) to get the optimal result.

In addition, the effect of clustering can be evaluated by the classification coefficient F and the average fuzzy entropy H. The more the classification coefficient Ftends to 1, the more the average fuzzy entropyHtends to 0, the better the clustering effect.

$$ F=\frac{1}{n}\sum \limits_{j=1}^n\sum \limits_{i=1}^c{u}_{ij}^2 $$
(28)
$$ H=-\frac{1}{n}\sum \limits_{j=1}^n\sum \limits_{i=1}^c{u}_{ij}\ln {u}_{ij} $$
(29)

4 Our methodology

The fault diagnosis method of rolling bearing in this paper is described in Fig. 1: (1) Collect the vibration signal, set the second penalty factor α, the decomposition levelK, and perform VMD decomposition on the vibration signal. (2) Through continuous optimization iterations, when the parameters meet the convergence constraint of a given precision e(e > 0), K BIMF components are output.(3) Set the non-extensive parameter q, find the Tsallis entropy of each BIMF function, and get the feature entropy value. (4) Perform FCM cluster analysis on the entropy value to determine the fault type of the vibration signal.

Fig. 1
figure 1

Flow diagram of rolling bearing fault diagnosis method in this paper

5 Experiment

In this paper, the Western Reserve University bearing test bench data are used for experiments [4]. The bearing test bench is shown in Fig. 2. The platform consists of a 1.5W motor, a torque sensor/decoder, a power test meter and an electronic controller.

Fig. 2
figure 2

Bearing experiment platform of Western Reserve University

Specifically, the experimental data come from the drive end bearing whose model is 6205-2RS JEM SKF deep groove ball bearing. The bearing inner ring diameter is 25mm, the outer ring diameter is 52mm, the thickness is 15mm, the rolling element diameter is 7.94mm and the pitch diameter It is 39.04mm. The rolling bearing fault is caused by artificial damage to the bearing by EDM. Then acquiring the vibration signal through accelerometers which are mounted in the motor housing to get the vibration signals under different faults, different speeds, and different load conditions. Lastly, analyzing the vibration signal to get whether there is one or more faults on the bearing. In addition, the vibration signal is collected by a 16-channel data recorder, and the power and speed are measured by a torque sensor/decoder.

In order to verify the effectiveness of this paper’s method, there were two cases in the experiment (1) Research on different types of fault diagnosis for the bearing with same shaft diameter; (2) Research on the same type of fault diagnosis for the bearing with different shaft diameter.

5.1 Different types of fault diagnosis for the bearing with same shaft diameter

In this case, the chosen shaft damaged diameter is 0.1778mm, the speed is 1772r/min, and the sampling frequency is 12kHz. In addition, The bearing has four status which are normal(NO), inner race(IR), outer race(OR) and rolling element (RE). In order to obtain a better diagnostic effect, the VMD parameters are determined experimentally in this section firstly, and then the effectiveness of the proposed diagnosis method is verified.

  1. (1)

    Parameter determination

When the original signal is decomposed based on VMD, the scale value K needs to be preset. The scale value K will affect the decomposition result, which in turn affects the feature extraction result and diagnosis result. Therefore, it is necessary to set the appropriate K value and prevent under-decomposition or over-decomposition. Here, a set of sample data with inner race faults is tested, and the length of sample data is 4096. Set the appropriate K by observing the center frequency of the signal decomposed at different K values. When K = 5, the sequence diagram and the spectrogram of each component are shown in Fig. 3. When K is taken different values, the center frequency of each BIMF component are shown in Table 1.

Fig. 3
figure 3

The VMD decomposition result of signal with inner race fault when K = 5

Table 1 The BIMF components center frequencies of signal with inner race fault at different K values

It can been seen from Table 1 that when K > 4, the center frequencies of different BIMF components change little. Especially when K = 5, the center frequencies of BIMF4 and BIMF5 are similar. This shows that when K > 4, the signal is over-decomposed. At the same time, when K < 4, the signal is under-decomposition. The frequency 1499.1Hz signal is missing when K = 3.Therefore, the best VMD decomposed scale K = 4 for the signal with the inner race fault .

Next, VMD decomposition on the signals with the other three types fault are performed at different K, the obtained center frequencies are shown in Table 2. Obviously, when K = 4, neither over-decomposition nor under-decomposition exists in VMD decomposition. Hence, K is set to 4 in the following experiments.

Table 2 The BIMF components center frequencies of signal with three different faults at different K
  1. (2)

    Fault Diagnosis

In this part, choosing 40 sets of signals as samples, and there are 2048 data per group. Set the decomposition scale K to 4 and perform VMD decomposition on the signal with different types of fault. Then calculate the Tsallis entropy of each decomposed components. The results are shown in Fig. 4.

Fig. 4
figure 4

The obtained Tsallis entropy with VMD decompsition of the signals with different faults

According the obtained Tsallis entropy, a 160 × 4matrix can be constructed, which can be used as the feature in diagnosis. The FCM clustering results are shown in Fig. 5, and the center coordinates are shown in Table 3. Specially, the clustering center number c = 4, fuzzy weighted index m = 2, convergence precision e = 0.001.

Fig. 5
figure 5

The clustering results of different fault signals

Table 3 The clustering center coordinates of different fault signals

To further test the diagnosis effect, calculate the classification coefficient and the average fuzzy entropy, and get F = 0.95466 and H = 0.14579. Obviously, F tends to 1 and H tends to 0. These results indicate that the FCM clustering result has good effect, and the proposed method that combining VMD, Tsallis entropy and FCM clustering is feasible in rolling bearing diagnosis .

5.2 Same type of fault diagnosis for the bearing with different shaft diameter

In this part, the rolling bearings with four different shaft damaged diameters were used for experiment, which are D1 = 0.1778mm, D2 = 0.3556mm, D3 = 0.5334mm, D4 = 0.7112mm respectively. The speed was 1772r/min, the sampling frequency was 12kHz, and only the inner race fault of rolling baring is tested. Similar to the first part, 40 sets of signals are chosen as samples, and there are 2048 data per group. K is still set to 4. The Tsallis entropy of each decomposed components shown in Fig. 6. The cluster result is shown in Fig. 7 and the cluster center coordinates are shown in Table 4(The parameters of FCM are the same as the first part).

Fig. 6
figure 6

The Tsallis entropy with VMD decompsition of the inner fault signals with different shaft diameter

Fig. 7
figure 7

The clustering results of inner fault signals with different shaft diameter

Table 4 The clustering center coordinates of inner fault signals with different shaft diameter

Calculate the classification coefficient and the average fuzzy entropy, get F = 0.94598 and H = 0.15506. It is clear that the proposed method in this paper is also applicable for the fault bearing diagnosis with different shaft diameter.

In addition, for proving the superiority of the method, comparing this paper’s method with another two methods, which are almost same to the above process, except the VMD is replaced by EMD (EMD + FCM) or LMD(LMD + FCM). The classification coefficient F and the average fuzzy entropy H obtained are shown in Table 5.

Table 5 The clustering effect of different methods

It can be seen that the classification coefficient F is greatest in the above both cases, and the average fuzzy entropy H is smallest. Obviously, the method proposed in this paper is more advantageous in fault diagnosis of rolling bearing .

6 Conclusion

In this paper, a new method for rolling bearing fault diagnosis is proposed, which apply VMD in signals decomposition, then use Tsallis entropy as the signal feature, lastly, combine FCM algorithm to diagnose. To verify the feasibility of the method, a series of experiments are preformed, the results are optimistic. Further, comparing with another methods which are EMD + FCM and LMD + FCM, it turns out that the method proposed in this paper is the best.