Keyword

1 Introduction

Rotating machinery is an indispensable part of the equipment manufacturing industry, and rolling bearing is an important part of rotating machinery. Its small defects (such as bearing crack and bearing abrasion damage) may lead to catastrophic accidents of the whole mechanical structure. Therefore, the timely and accurate classification of rolling bearing fault types has attracted many scholars [1,2,3].

The development of data-driven mechanical fault diagnosis benefits from the rapid development of sensing technology, computing systems and information storage technology in recent years. These technologies provide technical guarantees for data acquisition, transmission, and storage in manufacturing systems [4, 5]. However, the abnormal response of rolling bearing caused by fault is usually irrelevant. Due to the complex nonlinear behaviors e.g., friction contact between components, radial clearance of bearing and small vibration, the original time-domain signal show not only transient phenomenon but also nonlinear dynamic effects. Noise and various uncertainties further exacerbate this situation [6]. Traditional fault diagnosis methods consist of three main phases: Original signal acquisition, feature extraction and selection, fault classification and fault prediction. For example, Yuan et al. proposed high-fidelity noise-reconstructed empirical mode decomposition for mechanical multiple and weak fault extractions [7]. Qiao et al. used empirical mode decomposition, fuzzy feature extraction, and support vector machines to diagnose and verify the faults of steam turbine generator sets under three different working conditions [8]. Chen combined wavelet packet feature extraction technology and machine learning technology, an online monitoring model of logistic regression is proposed. The effectiveness of the scheme is verified by analyzing the tool wear vibration signal [9]. However, the traditional fault diagnosis methods listed above have their limitations. First, traditional methods rely on manual experience selection, and the quality of extracted features directly affects the performance of the final classification algorithm. Therefore, the stability of the diagnosis effect is not reliable. Second, In the current era of big data, a large amount of data needs to be processed in time and quickly. It takes time and effort to construct features manually. Even if the extracted features are used to train the model, the efficiency of the final model prediction or classification still needs to be improved.

Meanwhile, in order to meet the challenges of the big data era, people have done a lot of research on intelligent fault diagnosis methods in recent years. Deep learning models automatically find nonlinear features and realize classification by superimposing multiple network layers, which has achieved promising results in many tasks of artificial intelligence. Along with the development of the internet industry, lots of sensors are used in the operation detection system of mechanical equipment. The large explosion of data has made the traditional fault diagnosis methods difficult to meet the needs of the market, which also aroused people's interest in exploring the application of deep learning methods in fault diagnosis. For example: stacked Auto-Encoders (SAEs) [10], Deep Belief Networks (DBNs) [11], Recurrent Neural Networks (RNNS) [12], Generative Adversarial Networks (GAN) [13]. Due to deep learning’s powerful function of nonlinear feature mapping and advantage of end-to-end learning advantages, the results of bearing diagnosis based on deep learning method have been significantly improved.

Convolutional neural network (CNN) is one of the representative deep learning algorithms, it contains convolution layer, pooling layer, and complete connection layer. With the help of these, CNN can directly train the original time-domain vibration signal and identify the fault features hidden in the signal, so as to diagnose and classify different types of faults, this effectively avoids subjective feature selection and human intervention. As such, fault diagnosis technology driven by original data set has become attractive. Weimer et al. adopt deep convolutional neural network, which overcomes the difficulty of redefining manual fault characteristics for each new situation in the production process and improves the automation and accuracy of monitoring [14]. Ince and Abdeljaber et al. use 1DCNN to detect motor faults and this method has higher accuracy than the model-based method [15, 16]. However, on one hand, the accuracy of CNN is extremely dependent on large-scale datasets, for the research of rolling bearing diagnosis, the fault data set is usually limited, and the cost of obtaining a large amount of training data is very high. On the other hand, the performance of the diagnosis method will decline when the training data set contains multiple fault features. The signal is caused by the coupled composite fault characteristics in the dynamic signals of complex systems. Besides, when the data source distribution in the test set deviates from the training set data source in the target domain, the diagnostic performance of the network will be observed to decline. The signal differences are caused by different monitoring environments in which vibration signals are collected, e.g., different working loads, sensor positions, and rotation speeds. CNN network model is difficult to train the network parameters of multiple characteristic signals, but the actual working conditions are always changing, and the same fault type does not exist in a single form.

Therefore, this paper by introducing the combination of multiwavelets transform and CNN, the ability of multiwavelets containing multiple different frequency domain basis functions to extract multiple faults at the same time, multiwavelets also fully inherit the properties of single wavelets such as the property of orthogonality, symmetry, compact support [17]. Besides, the features extracted from the first layer of the convolutional neural network will affect the overall diagnostic performance of the network, the quality of feature extraction directly affects the accuracy of whole network fault identification and multiwavelets transformation is a natural convolution process based on multiple wavelet basis functions. Therefore, replacing the first convolution layer of convolution neural network with multiwavelets transform can not only give full play to the excellent attributes of multiwavelets transform but also have better compatibility.

Besides, the multiwavelets basic functions are similar to fault features could extract fault features in dynamic signals, this paper constructs customized multiwavelets layer parameters according to different input signals, customized multiwavelets construct the most matching multiwavelets basis function according to different signal characteristics. The improved CNN by multiwavelets reduces the network training parameters, reduces the problem of big data driving, and it improves the problem of poor fitting effect of classical CNN in the case of multiple fault classification.

The rest of the paper is organized as follows. Section 2 mainly expounds on the basic theory of multiwavelets and the basic structure of convolution neural networks. Section 3 describes the basic structure of the improved convolution neural network by replacing the multiwavelets. The simulation results are presented and compared with the existing scheme in Sect. 4. Finally, we conclude this paper.

2 Basic Theory of Method

2.1 Multiwavelets

Multiwavelets Multiscales Analysis

Multiwavelets refer to wavelets generated by two or more functions as scaling functions. The basic theory is to expand the multi-resolution analysis space generated by a single wavelet with multiscales functions [18].

The vector valued function \(F(x) = (f_1 (x),f_2 (x), \cdots f_r (x))^T\), if there is \(f_j (x) \in L^2 (R),j = 1,2, \cdots r\), it is recorded as \(F(x) \in L^2 (R)^r \left( {\frac{\pi }{2} - \theta } \right)\), if \(\Phi = (\phi_1 , \cdots \phi_r )^T \in L^2 (R)^2\) satisfy the two-scale relationship:

$$ \Phi (t) = \mathop \sum \limits_{k = 0}^N {\rm H}_k \Phi (2t - k) $$
(1)

where \(\{ H_K \} ,K = 0,1, \cdots ,M\) is r × r two-scales matrix sequence, which is r-order scale function. The r multiresolution analysis generated by \(\Phi (x)\) is defined as:

$$ V_j = clos_{L^2 (R)} \left\{ {2^{j/2} \varphi_i \left( {2^j x - k} \right):1 \le i \le r,k \in Z} \right\} $$
(2)

\(V_j\) is subspace with a resolution of \(2^j\).

If \(W_j\) is the complement subspace of \(V_j\) space in \(V_{j + 1}\), the vector function \(\Psi (x) = (\psi_1 ,\psi_2 , \cdots ,\psi_r )^T \in L^2 (R)^r\), the expansion and translation components construct a Rise basis of \(W_j\) subspace

$$ W_j = clos_{L^2 (R)} \left\{ {2^{j/2} \psi_i \left( {2^j x - k} \right):1 \le i \le r,k \in Z} \right\} $$
(3)

There is a matrix sequence \(\{ G_k \}_{k \in Z}\), make \(\Psi (x) = (\psi_1 ,\psi_2 , \cdots ,\psi_r )^T\) satisfy the following two-scale relationship:

$$ \Psi (x) = \sum_{k = 0}^N {G_k \Phi (2x - k)} $$
(4)

where \(\{ G_k \} ,k = 0,1, \cdots ,N\) is r × r two-scales matrix sequence, which is r-order wavelet function. Studies on multiwavelets with multiplicity \(r > 2\) are rare. Hence, \(r = 2\) is studied in this paper.

By the dilations of Eq. (1) and Eq. (4), the following recursive relationship between the coefficients \((c_{1,j,k} ,c_{2,j,k} )^T\) can be obtained.

$$ \left( {\begin{array}{*{20}l} {c_{1,j - 1,k} } \hfill \\ {c_{2,j - 1,k} } \hfill \\ \end{array} } \right) = \sqrt 2 \mathop \sum \limits_{n = 0}^K {{\varvec{H}}}_n \left( {\begin{array}{*{20}l} {c_{1,j,2k + n} } \hfill \\ {c_{2,j,2k + n} } \hfill \\ \end{array} } \right),\;\;\;\ j,k \in Z $$
(5)
$$ \left( {\begin{array}{*{20}l} {d_{1,j - 1,k} } \hfill \\ {d_{2,j - 1,k} } \hfill \\ \end{array} } \right) = \sqrt 2 \mathop \sum \limits_{n = 0}^K {{\varvec{G}}}_n \left( {\begin{array}{*{20}l} {c_{1,j,2k + n} } \hfill \\ {c_{2,j,2k + n} } \hfill \\ \end{array} } \right),\;\;\;\ j,k \in Z $$
(6)

Hermite Spline Multiwavelets

Hermite spline multiwavelets is the orthogonal multiwavelets constructed by using third-order spline function, which have 2-order continuous differentiability. The multiscales support interval [0, 2], and they have 4-order approximation order. The support interval of the mulitwavelets function is [0, 3] and they have 2-order approximation order. The two scale relationship is shown in Eq. (7) and Eq. (8).

$$ \Phi (x) = \left[ {\begin{array}{*{20}c} {\varphi_1 (x)} \\ {\varphi_2 (x)} \\ \end{array} } \right] = H_0 \Phi (2x) + H_1 \Phi (2x - 1) + H_2 \Phi (2x - 2) $$
(7)

where \(H_0 = \left[ {\begin{array}{*{20}c} \frac{1}{2} & \frac{3}{4} \\ { - \frac{1}{8}} & { - \frac{1}{8}} \\ \end{array} } \right],H_1 = \left[ {\begin{array}{*{20}c} 1 & 0 \\ \frac{1}{2} & \frac{1}{8} \\ \end{array} } \right],H_2 = \left[ {\begin{array}{*{20}c} \frac{1}{2} & { - \frac{3}{4}} \\ \frac{1}{8} & { - \frac{1}{8}} \\ \end{array} } \right]\).

$$ \begin{aligned} \Psi (x) & = \left[ {\begin{array}{*{20}c} {\psi _1 } \\ {\psi _2 } \\ \end{array} } \right] = G_0 \Psi (2x) + G_1 \Psi (2x - 1) + G_2 \Psi (2x - 2) \\ & + \,G_3 \Psi (2x - 3) + G_4 \Psi (2x - 4) \\ \end{aligned} $$
(8)

where

$$ \begin{aligned} & G_0 = \left[ {\begin{array}{*{20}c} {\frac{{67}}{{240}}} & {\frac{7}{{240}}} \\ { - \frac{{95}}{{972}}} & { - \frac{1}{{162}}} \\ \end{array} } \right],G_1 = \left[ {\begin{array}{*{20}c} { - 1} & {\frac{{187}}{{60}}} \\ {\frac{{89}}{{243}}} & {\frac{{91}}{{81}}} \\ \end{array} } \right],G_2 = \left[ {\begin{array}{*{20}c} {\frac{{173}}{{120}}} & 0 \\ 0 & {\frac{{26}}{9}} \\ \end{array} } \right],G_3 = \left[ {\begin{array}{*{20}c} { - 1} & {\frac{{187}}{{60}}} \\ { - \frac{{89}}{{243}}} & {\frac{{91}}{{81}}} \\ \end{array} } \right], \\ & G_4 = \left[ {\begin{array}{*{20}c} {\frac{{67}}{{240}}} & { - \frac{7}{{240}}} \\ {\frac{{95}}{{972}}} & { - \frac{1}{{162}}} \\ \end{array} } \right] \\ \end{aligned} $$

2.2 CNN

Convolutional Layer

Convolution layer extracts data features by convoluting the convolution layer parameters with the input data. Usually, a convolution layer has multiple convolution kernels. Because the same convolution kernel shares parameters in the process of convolution, one convolution kernel learns a class of features, which is called mapping graph. To calculate the output \(y^i\), first the inputs \(x^1 ,x^2 ,...,x^d\) are convoluted with convolutional kernels \(W^{i,d}\). Then, add all the convolutional operation results, the sum of convolution results is added to scalar offset value \(b^i\). The output of the convolutional layer \(Z^i\) can be obtained as.

$$ Z^i = {{\mathbf{W}}}^i \otimes {{\mathbf{X}}} + b^i = \mathop \sum \limits_1^d W^{i,d} \otimes x^d + b^i $$
(9)

where \(\otimes\) represents the convolutional operation and \(W^i \in {\mathbb{R}}^{m \times n \times d}\) is the convolution kernel. Based on the nonlinear activation function, the output feature map \(y^i\) can be represented as.

$$ y^i = g\left( {Z^i } \right) $$
(10)

where \(g( \cdot )\) represents the nonlinear activation function. In this research, convolution layer is adopt the rectified linear unit (ReLU) function as the activation function.

Polling Layer

The pooled layer performs subsampling by checking the input data and extracts features while reducing the dimension of the data. Pooling includes maximum pooling and average pooling, among which maximum pooling has the best effect, which can be described as the following.

$$ p^{1(i,j)} = \max\nolimits_{(j - 1)w < t < jw} \{ a^{1(i,t)} \} j = 1,2, \cdots ,q $$
(11)

where \(p^{1(i,j)}\) denotes t-th neuron of the i-th feature map in layer 1, \(w\) represent width od convolutional kernel, \(j\) represent j-th pooling kernel.

Fully Connected and Output Layers

The signal features extracted from the upper network are input to the first full connection layer for one-dimensional sequence expansion. Each \(\lambda_c\) matrix by the finally output layer will be input into a softmax function \(\varphi (\cdot)\), which is defined by

$$ \varphi \left( {\lambda_c } \right) = \frac{{e^{\lambda_c } }}{{\sum_{c = 1}^C e^{\lambda_c } }},c = 1, \cdots C $$
(12)

The \(\varphi = [\varphi (\nu_1 ), \cdots ,\varphi (\nu_C )]\) represent a C-dimensional probability vector. Represents the probability distribution under C kinds of test conditions. The output value of the softmax function represents the probability distribution of the input signal in each tag.

3 The Presented Network Structure

Aiming at bearing fault, this paper improves the network framework based on CNN is illustrated in Fig. 1. Original time-domain signals from sensor are directly extracted features by multiwavelets layer of improved CNN, the feature signal extracted by multiwavelets is transmitted to deep network composed of one-dimensional convolutional and pooling layers. Since the characteristic signal after multiwavelets transform is a two-dimensional signal, the convolution kernel of the next convolution layer is also a two-dimensional matrix, and the characteristic signal after passing through the first convolution layer will be the one-dimensional characteristic signal, the dimension of convolution kernel in other convolution layers is one-dimensional convolution kernel. Finally, a fully connected (FC) layer and a multi category output layer composed of softmax function are used as the bottom architecture of the network.

Fig. 1.
figure 1

The structure of the improved CNN by multiwavelets.

3.1 Multiwavelets Layer

The specific implementation steps of the multiwavelets layers are as follows.

Split sublayer: the signal \(f\) is divided into two new signal according to the odd and even bits of the data sequence, even sequence samples are \(p\) and odd sequence samples are \(q\).

$$ p(x) = f(2x) $$
(13)
$$ q(x) = f(2x + 1)x \in z $$
(14)

Predict sublayer: we use the optimizer matrix \(\partial\) to convolute even samples \(q\) to predict odd samples \(p\). The error between the predicted value and real value is \(\Delta\) that defined as detail coefficients.

$$ \Delta = p - \partial \ast q $$
(15)

where \(\partial\) is the matrix vector of prediction operator, the symbol \(\ast\) represent matrix vector convolution operation.

Update sublayer: the detail coefficients obtained by the predictor are transferred into the update sublayer, it convoluted with updater \(U\) composed of the parameter matrix to perform the operation and adding the result to \(p\). The update sequence \(p^U\) represent the vector of approximation coefficients.

$$ p^U = p + U \ast \Delta $$
(16)

where \(U\) represents the vector of two dimensionals matrix as updater.

The custom multiwavelet layer is based on the evolution of adaptive multiwavelet theory, and also integrates the biorthogonal perfect reconstruction multi filter bank to process the input signal. The transformation of parameter \(k\) control the matrix lowpass and highpass filters \(\{ {\rm H},G,\tilde{H},\tilde{G}\}\) as well as predict sublayer operation and update sublayer operation \(\{ \partial ,U\}\).

$$ {{\varvec{G}}}(k) = k\left( {{{\varvec{I}}} - \partial \left( {k^2 } \right)/k} \right) $$
(17)
$$ {{\varvec{H}}}(k) = {{\varvec{I}}} + {{\varvec{U}}}\left( {k^2 } \right){{\varvec{G}}}(k) $$
(18)
$$ {\tilde{\user2{H}}}(k) = {{\varvec{I}}} + \partial \left( {k^2 } \right)/k $$
(19)
$$ {\tilde{\user2{G}}}(k) = \left( {{{\varvec{I}}} - {{\varvec{H}}}(k)k{{\varvec{U}}}\left( {k^2 } \right)} \right)/k $$
(20)

In order to ensure the linear phase of predictor filters, it require symmetry, the operator \(\partial\) of predict sublayer is subjected to the symmetry condition.

$$ \partial (0) = \left( {\begin{array}{*{20}c} \frac{1}{2} & \frac{1}{4} \\ c & { - \frac{1}{4}} \\ \end{array} } \right),\;\;\;\ \partial ( - 1) = \left( {\begin{array}{*{20}c} \frac{1}{2} & { - \frac{1}{4}} \\ { - c} & { - \frac{1}{4}} \\ \end{array} } \right) $$
(21)

Next, operator \(U\) of update sublayer closely related to \(\partial\) can be calculated as

$$ {{\varvec{U}}} = \{ U(0),U(1)\} ,\;\;\;\ U(0) = \partial ( - 1)/2 = \left( {\begin{array}{*{20}c} \frac{1}{4} & { - \frac{1}{8}} \\ { - \frac{c}{2}} & { - \frac{1}{8}} \\ \end{array} } \right),\;\;\;\ U(1) = \partial (0)/2 = \left( {\begin{array}{*{20}c} \frac{1}{4} & \frac{1}{8} \\ \frac{c}{2} & { - \frac{1}{8}} \\ \end{array} } \right) $$
(22)

Equation (22) show that the parameter \(c\) affects not only the matrix vector of predictor \(\partial\) but also the matrix vector updater \(U\). In other words, free parameter \(c\) will change the multiscales and multiwavelets functions due to Eq. (21).

3.2 Multiwavelets Layer Parameters

The customized multiwavelets layer relies on the basic function to change its kernel function with different input signals, to match different fault features more accurately. Table 1 shows the important parameters of improved CNN by multiwavelets, here each data sample of the input network contains 1024 sampling points. The shallow convolution layer is replaced by a multiwavelets layer with a kernel size of 10, and the number of output channels is set to 1, which greatly reduces the shallow training parameters of the network and improves the overall training speed and convergence speed of the network. The fault features are extracted by multiwavelets layer and transferred into convolution layers, the fault feature information is further mined. Two convolutional layers are set, Conv1D represent one vector between convolution kernel and the input parameters. The next adaptive maximum pooling layer adopt to the kernel number 16. Then pass feature signal to the full connection layer. The final dimension of output data is \(m\) which is consistent with the type of input label.

Fig. 2.
figure 2

The flowchart of improved CNN by multiwavelets.

Figure 2 shows the date flow framework structure of improved CNN by multiwavelets, described as follows.

  • Step1: In the data preprocessing stage, the collected time-domain signals containing various fault types are divided into training set and test set according to the ratio of 6:4.

  • Step2: In the training stage, two different fault timing signals are packaged into the same two-dimensional matrix, the improved CNN is initialized.

  • Step3: According to different fault signal types, set different parameter \(c\) constraint matrices \(\partial\) and \(U\), therefore, the kernel parameters of the multiwavelet layer are determined.

  • Step4: The data is processed by the split sublayer, predict sublayer, and update sublayer to extract fault features.

  • Step5: The fault features extracted from the shallow layer are introduced into two one-dimensional convolutional layers and adaptive maximum pooling layer, the fault features characteristics of the input signals will be fully mined.

  • Step6: The results are calculated by the softmax function get the probability of each feature label. At the same time, the gradient of each layer in the network model is calculated by using the back propagation algorithm to continuously modify the network parameters and improve the training accuracy.

  • Step7: The loss rate is calculated according to the cross entropy loss function to judge whether the network accuracy has completed the training.

  • Step8: The well-trained network model is applied to the testing data.

  • Step9: Identify the fault type label for the input signal.

Table 1. The parameters of improve CNN model.

4 Experiment Verification

4.1 Experiment Description

In this section, the improved CNN by multiwavelets model is applied to the laboratory bearing dataset of Case Western Reserve University (CWRU) for verification [19]. The vibration signals of 12000 samples/second are collected through two accelerometers (sensors) installed at the drive end of an electric motor in the test rig under four different conditions. Table 2 shows the detailed working condition data of ten faults. The damage diameter of three basic faults is 0.007 in., 0.014 in., and 0.021 in.. The signal sampling frequency is 12 kHz. The collected signal of each fault state is divided into 100 training samples and each sample contains 1024 points. The original signal is allocated to training samples and testing samples by a ratio of 3:2.

Table 2. Rolling bearing operation state

4.2 Parameter Optimization of Multiwavelets Layer

In the process of training parameters, the network will initialize parameters, but the value of random initialization parameters is uncertain, so it is difficult to fully show the feature extraction ability of the multiwavelets layer, Fig. 3 shows the loss function decreases during the iterative training of network parameters, which compares the network training speed and fitting effect of parameters under different values, it can be observed from the figure that when the \(c = - 2\), the network training speed and the stability of parameter fitting reach the best, Table 3 shows fault classification average accuracy of multiwavelets layer under different parameter values, and when multiwavelets parameter \(c = - 2\), the network model full play to performance. Therefore, in the case of parameter \(c = - 2\), the network model can achieve the best speed, stability, and diagnostic accuracy of multi-fault classification.

Fig. 3.
figure 3

Changes of cross-entropy loss in different parameters.

Table 3. The accuracy of the dataset.

4.3 The Performance of Bearing Fault Diagnosis

Multilayer Perceptron (MLP), and 1DCNN are taken for comparison with the presented solution, improve CNN model \(c = - 2\) as the parameter of multiwavelet. MLP consists of four layers: input layer, output layer, and two hidden layers, 1DCNN has the same convolution layer parameters as the improve CNN, in which the convolution kernel size of the shallow network is 10. The two groups of experimental schemes are fully in line with the principle of controlling a single variable.

Figure 4 is shown the CNN and improved CNN by multiwavelets decreased rapidly in the first five iterations and converged in the seventh iteration, the MLP is difficult fully converge in the multi-fault classification experiment, and the Cross entropy loss function fluctuates greatly, this phenomenon proved that MLP model has obvious shortcoming in nonlinear fitting problem. CNN network shows a good training effect in multi-fault nonlinear fitting fault. However, CNN fluctuated greatly in the 14th, 37th and 38th iterations show that the training parameters are unstable in the process of multiple training. The improved CNN by multiwavelets model not only inherits the excellent nonlinear fitting ability of the CNN model in the face of multi-fault problems but also has the spatial-temporal features extracted by the multiwavelets, which makes the model maintain better network stability and reliability in multiple iterations compared with the classical CNN model.

Fig. 4.
figure 4

Variation of loss value in different experimental schemes.

Table 4. Classification results in case.

The accuracy of these methods on common Dataset is listed in Table 4. After training these three schemes five times, this paper brings the training parameters of each time into the test set for verification, and the average accuracy shows that improved CNN has a stronger ability to extract fault features. Fig. 5 visualizes the verification results of 400 sets of test sets by three methods, Figure (a) shows the confusion matrix thermodynamic diagram of the improved CNN by multiwavelets. Figure(b) and Figure(c) show the confusion matrix thermodynamic diagram of CNN and MLP respectively. These figures indicate that the improved CNN has higher accuracy in fault classification.

Fig. 5.
figure 5

Sample classification in validation set.

5 Conclusion

CNN method relies on training a large number of the same fault data and updating network parameters through backpropagation to establish the recognition ability of the same type of fault. However, under the actual working conditions, the types of bearing faults are constantly changing, and it is difficult to collect sufficient data for the same type of faults. When small datasets are employed, the first layer of CNN is difficult to effectively extract deep feature parameters and it influences the performances of the entire network. Therefore, an improved CNN by multiwavelets is proposed in this paper. The first layer of the model is the multiwavelets transform in signal processing, multiwavelets layer completely inherits the advantages of the fast extraction of features signals from multiple wavelets base of multiwavelets. Original signals in the time domain are directly input into the improved CNN, multiwavelets layer is used as a multi-channel filter to extract multi fault features at the same time, subsequently, the features extracted by the multiwavelets layer are fused as inputs to the next layer. By enhancing the ability of network shallow feature extraction, accurately extract fault features and reduce network parameters. By incorporating multiwavelets feature extraction, this proposed CNN method can accurately diagnose faults with smaller data set samples in practical bearing diagnosis, and by customizing multiwavelets layer parameters, the influence of shallow layer parameters on the overall network performance is discussed. The validity of this method is fully verified by publicly experimental datasets and comparsion with the traditional methods and classical methods.

Based on the CNN shallow layer replacement multiwavelets model, this paper mainly finds the multiwavelets layer parameters most suitable for fault characteristics through parameter optimization. In the future research direction, the multiwavelets layer parameters can be adaptively matched with fault characteristics to improve the effect of network diagnosis efficiency, and the underlying logic of CNN network training parameters can be discussed based on the physical meaning of multiwavelets layer.