1 Introduction

As one of the most essential components, rolling bearing is widely used in various mechanical systems, and its running state determines the machinery’s RUL. The failure of bearings can result in serious accidents and major economic loss. Therefore, to ensure the safe and reliable operation of a mechanical system, RUL prediction of bearings is indispensable.

With the increasing complexity of sampling data, supervised data-driven methods have been extensively researched [1,2,3]. The core idea is to train neural network models using HI and RUL labels, which can then be used to predict the RUL of bearings [4,5,6]. Therefore, HI performance significantly influences RUL prediction, and the construction of HI is generally based on monitoring signals. Different HI construction strategies can be divided into physical HI (PHI) and fusion HI (FHI) [7]. PHI can be obtained through a simple signal processing method, such as root mean square (RMS) and kurtosis of a bearing vibration signal, which contains some physical meanings. In [8], the RMS is employed as PHI to indicate the degradation trend of bearings. Meanwhile, FHI is constructed by fusing or reducing the dimensionality of physical features, which can reflect the degradation process but has no real physical significance. For example, Gao et al. [9] extracted time-domain features and fused the data to build an HI; they then predicted the bearing RUL using the Metropolis–Hastings algorithm. In addition, Tian et al. [10] combined four selected features into one HI to characterize the life state. The above HI performed well and achieved good prediction. Based on these studies, this paper proposes a new HI construction method. First, the approach extracts similarity features based on the correlation between statistical features and time-series data. Then, it combines the time–frequency domain features to build an HI, which can effectively mine useful degradation information and represent the health states of bearings.

HI quality is an important factor affecting the RUL prediction accuracy; another essential factor is the learning capability of intelligent computational algorithms. With the improvement in computing power, deep learning (DL) [11], including long short-term memory (LSTM) [12, 13], deep belief network (DBN) [14], and CNN [15,16,17], has become a hot research topic. Ding et al. [18] considered the transition of different degradation stages and proposed one- and multi-stage iteration prediction models based on the LSTM neural network for RUL prediction. The experimental results showed the competitive performance of the LSTM neural network. DBN, one of the critical approaches for big data processing and analysis, can also be used to predict RUL. For example, Jason et al. [19] used DBN to predict the RUL of rotating equipment, which explore the self-taught feature-learning capability of DBN. Among the various DL models, a CNN attracts special attention because of its feature extraction ability, which is widely used in recognition and detection problems in the field of computer vision. [20,21,22] In addition, 1-D CNNs have recently been successfully applied to speech recognition and document reading [23,24,25]; moreover, the utilization of CNN-based models shows considerable potential in RUL prediction [25,26,27,28]. Compared to a deep neural network, a CNN uses fewer parameters to achieve the same functionality or precision [29]. Cheng et al. [30] exploited the adoption of CNNs in predicting the bearing RUL, and the experimental results demonstrated a superior performance of the proposed CNN model. Hence, the current paper focuses on constructing significant bearing HI which can be utilized to define the bearing failure stages. And proposes a 1-D CNN model to mine the underlying information of the HI, learning the degradation of bearings according to the HI and RUL labels. The model’s main function is to reveal the hidden dependence between an HI and training bearing RUL by using the feature extraction ability of CNNs for RUL prediction.

This study tests the bearing RUL by using two DL methods: a CNN and an LSTM network model, which are also applied to validate the effectiveness of the proposed approach.

The viability of the proposed approach is proved using the FEMTO bearing datasets [31] and the Intelligent Maintenance System (IMS) bearing dataset [32]. The following contributions can be summarized:

  1. 1.

    A related similarity (RS) feature construction approach, used for both time and frequency domains, is presented based on the similarity between the current data and original operating-point data. The RS features are built to mine valid degradation information, ranging from 0 to 1, so that normalization can be omitted.

  2. 2.

    A special HI based on the RS feature and principal component analysis (PCA) dimension reduction is constructed. This can effectively illustrate the degradation in bearing operation.

  3. 3.

    A CNN model that inputs the HI of rolling bearings is presented, which improves the accuracy of RUL estimations.

  4. 4.

    The potential of the proposed method is demonstrated using the experimental datasets. The results show that CNNs perform better than the LSTM prognosis approach.

The novelty of the proposed method includes (1) the HI based on the RS feature and PCA can effectively reflect the health status of bearings; and (2) the special HI significantly reduces the complexity of the CNN model and improves the accuracy of predicting the RUL of bearings.

The remaining paper is organized as follows. Section 2 describes the proposed approach; Sect. 3 presents the experimental verification of the proposed methodology and provides the results. Finally, Sect. 4 concludes the study.

2 Proposed approach

The framework of the proposed method is based on the run-to-failure data of rolling bearings: the time, frequency, and time–frequency domain features are first extracted to construct similarity features, and then sensitive features are screened from them. The HI using the data fusion method to construct is to characterize bearing degradation status. Finally, the bearing RUL is predicted using a data-driven model.

Figure 1 shows the flowchart of the proposed method. First, 23 features in the time, frequency, and time–frequency domains are extracted to form the characteristic time series of rolling bearing vibration signals. Then, the similar characteristics of the time and frequency domains are constructed based on the Pearson correlation coefficient-which, together with the time–frequency domain features, constitute 14 bearing degradation features. Next, the features are sorted by the Cori index according to monotonicity and tendency, and the features with scores exceeding the preset threshold are selected. Using PCA, these features are combined to construct the HI. Finally, a CNN degradation model is constructed, and the HI is used as its input to predict the RUL of rolling bearings. The implementation process is detailed in Sects. 2 and 3. The RUL prediction method is verified using the whole life data of bearing vibration acceleration signals in FEMTO bearing datasets.

Fig.1
figure 1

Process of remaining life CNN prediction method based on similarity feature fusion

2.1 Data acquisition

In this paper, FEMTO bearing datasets [31] and IMS bearing datasets [32] have been used to analyze the degradation of the bearings and estimate their RUL. Details of the dataset are provided in Sect. 3.

2.2 Feature extraction

For the vibration signal x = [x1, x2,…,xn] collected in each sampling period, 10 traditional time-domain statistical characteristics are extracted (Appendix).

In the frequency domain, FFT transforms the time-domain signal x = [x1, x2,…,xn] into the frequency-domain signal s = [s1,s2,…,snfft]. FFT is used to extract the complete frequency spectrum of the time-domain signal, which is divided into four sub-band spectra, and the amplitudes of this spectrum and the four sub-band spectra are applied as five frequency-domain features. In the time–frequency domain, to obtain the nonlinear characteristics of the vibration signal more comprehensively, multilevel wavelet packet decomposition is used to extract the time–frequency features. In this study, the Haar wavelet is used to decompose the vibration signal into a group of wavelet packet nodes through three-level wavelet packet decomposition, which divides the frequency axis into eight frequency bands, and the energy ratio of each band is exploited as eight time–frequency features.

2.3 Similarity feature construction based on pearson’s correlation coefficient

Pearson’s correlation coefficient can better reflect the correlation between feature time-series data, which is widely used in similarity measurement [33]. Hence, it is used as the basis for calculating time- and frequency-domain RS features, the calculation method is according to formula (1):

$$R = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {x_{i} - \overline{x}} \right)\left( {y_{i} - \overline{y}} \right)}}{{\left[ {\mathop \sum \nolimits_{i = 1}^{N} \left( {x_{i} - \overline{x}} \right)^{2} \mathop \sum \nolimits_{i = 1}^{N} \left( {y_{i} - \overline{y}} \right)^{2} } \right]^{1/2} }}$$
(1)

Here, R lies in the interval [-1,1]. When the Pearson correlation coefficient is close to 1, the correlation between the x and y sequences is greater than the coefficient close to 0. To describe the correlation between the features more directly, the absolute value of R is adopted to construct the RS features of the time and frequency domains. Let ft indicate the feature series sampled at time t and f0 indicate the time series of the initial observation time; then, the RS features can be calculated according to formula (2).

$$RS = \frac{{\left| {\mathop \sum \nolimits_{i = 1}^{N} \left( {f_{0}^{i} - \overline{{f_{0} }} } \right)\left( {f_{t}^{i} - \overline{{f_{t} }} } \right)} \right|}}{{\left[ {\mathop \sum \nolimits_{i = 1}^{N} \left( {f_{0}^{i} - \overline{{f_{0} }} } \right)^{2} \mathop \sum \nolimits_{i = 1}^{N} \left( {f_{t}^{i} - \overline{{f_{t} }} } \right)^{2} } \right]^{1/2} }}$$
(2)

Type:

N – length of each feature time series;

\(\overline{f}_{0}\)– the average of \(\{ f_{0}^{i} \}_{i = 1:N}\);

\(\overline{f}_{t}\)– the average of \(\{ f_{t}^{i} \}_{i = 1:N}\).

If ft is equal to f0, RS is 1, and the bearing is in the initial health state. When t increases gradually, the time series and initial observation time series show a different trend. At this moment, RS decreases continuously until it reaches 0, and the bearing is in a state of degradation.

After the similarity feature calculation method is applied in 10 traditional time-domain features, one time-domain RS feature, named RS1, is acquired. Using the same method, we obtain five frequency-domain RS features, which are marked as RS2–RS6. We consider them and the eight time–frequency domain features as the original feature set. All features are shown in Table 1.

Table 1 Original feature set

2.4 Sensitive features selection

In the original feature set, there are redundant features that cannot completely describe the life state of the bearings; therefore, the sensitive features is screened from the original feature set to select the features that can accurately reflect the degradation process of the bearing. To construct a suitable HI, the trend and monotonicity of the original feature are quantitatively evaluated. The trend can assess the linear correlation between the feature and observation time. The greater the trend, the more the feature tends to degrade over time, and the greater the correlation with the true value. Monotonicity can evaluate the nature of a feature’s increasing or decreasing trend on the time axis. The greater the monotonicity, the better is the monotonicity of the feature; otherwise, the feature is oscillating.

For the feature sequence f = [f1,f2,…,fk] and time sequence t = [t1,t2,…,tk], ft is the corresponding degradation feature value at time t, where K is the largest sample number value. The specific calculation method of Trend is shown in Formula (3):

$$Trend\left( {F,T} \right) = \frac{{\left| {\mathop \sum \nolimits_{k = 1}^{K} \left( {f_{t} - \overline{F}} \right)\left( {t_{k} - \overline{T}} \right)} \right|}}{{\sqrt {\mathop \sum \nolimits_{k = 1}^{K} \left( {f_{t} - \overline{F}} \right)^{2} \mathop \sum \nolimits_{k = 1}^{K} \left( {t_{k} - \overline{T}} \right)^{2} } }}$$
(3)

Type:

N – length of each feature time series;

\(\overline{F}\)– the average of \(\{ f_{i} \}_{i = 1:K}\);

\(\overline{T}\)– the average of \(\{ t_{i} \}_{i = 1:K}\).

The specific calculation method of monotonicity is shown in formula (4):

$$Mon\left( F \right) = \left| {\frac{dF > 0}{{K - 1}} - \frac{dF < 0}{{K - 1}}} \right|$$
(4)

Type:

dF–Differentiation of characteristic sequences.

Both trend and monotonicity are in the range [0, 1]. To measure the original features, this paper uses the Cori index to screen the sensitive feature. The specific calculation method is shown in formula (5):

$$Cori = \frac{Trend + Mon}{2}$$
(5)

The Cori index for each original feature is calculated according to the ranking of the Cori values, and the corresponding features above the set threshold are screened out to form a sensitive feature set. Hence, features that have good representation capabilities of bearings are selected.

Feature extraction is carried out for time-series data within T time units, and N feature time series on a sensitive feature set constitute a feature matrix XT × N; then,\(F_{{\{ 1:T\} }}^{i}\) is used to represent the feature time series of the ith column of XT × N.

2.5 HI construction based on PCA

To quantitatively evaluate the degradation state of the bearings, the HI is constructed based on the whole bearing life cycle. In the HI construction process, PCA is a widely used method in data fusion and feature extraction [34], which can screen out the interaction information of features on sensitive feature sets. Through orthogonal transformation, the original features are projected to a few comprehensive features; therefore, a small amount of data can be used to represent the main information features implied by the input data to achieve dimensionality reduction. The HI construction approach based on PCA can be described as follows:

  1. (1)

    To address the problem of inconsistent data weights caused by the excessive difference in sensitive feature dimensions, the maximum and minimum normalization method is used to normalize XT × N, which is indicated as \({X}_{norm}^{T\times N}\).

    $$X_{norm}^{T \times N} = \frac{{x - {\text{min}}\left( x \right)}}{{\max \left( x \right) - {\text{min}}\left( x \right)}}$$
    (6)
  2. (2)

    The covariance matrix C of \({X}_{norm}^{T\times N}\) is identified.

    $$C = \mathop \sum \limits_{i = 1}^{n} (\left( {X_{i} - \mu } \right)\left( {X_{i} - \mu } \right)^{T} )$$
    (7)

    where \({X}_{i}\),are the variables of the data set\({X}_{norm}^{T\times N}\), and μ is the mean vector.

  3. (3)

    The eigenvalue \({\lambda }_{i}\) of C and the corresponding eigenvector \({v}_{i}\) are calculated, and the relationship is as follows.

    $$Av_{i} = \lambda_{i} v_{i} , i = 1,2,3, \ldots ,n$$
    (8)

    The eigenvalue is the variance of the first principal component (PC1), and the eigenvector is the column vector of the transformation matrix.

  4. (4)

    The eigenvalues are sorted, the largest K eigenvalues are selected, and the corresponding K eigenvectors are used as row vectors to form the eigenvector matrix P.

  5. (5)

    The data are converted to a new space constructed by K feature vectors, and Y{1:T} = [y1,y2,…,yT] is obtained through Y = PX, which is the principal component acquired by PCA fusion.

  6. (6)

    The maximum and minimum normalization processing is performed on PC1, which is applied as a dimensionless index that maps the result to 0–1. As the HI in this paper, PC1 does not have any physical meaning but has good monotonicity and trend, which can reflect the performance degradation trend of the rolling bearings more intuitively and significantly.

2.6 One-dimensional CNN for RUL prediction

A CNN is a representative DL algorithm, which is a feed-forward neural network inspired by the local receptive field in biological neurology. Among an increasing number of variants of the CNN model, a 1-D CNN is a suitable model for processing 1-D signals in the industrial field [35].

In general, the one-dimensional CNN structure is mainly composed of the convolutional layer, the pooling layer, and the full connection layer, which exhibit the characteristics of local connection, weight-sharing, pooling, and multilayer structure [36].

(1) Convolutional layer

Frist is the input layer, which mainly includes the convolutional layer shown in Fig. 2(a). Each convolutional layer consists of a set of learnable kernels K (also known as the filter). The convolved feature is calculated by the dot product between the weight of the filter and the perceived region of input data, which can be expressed as:

$$y_{i}^{l} = f\left( {\mathop \sum \limits_{i}^{k} K_{i,j}^{l} *X_{i}^{n} + b_{j}^{l} } \right)$$
(9)

where \({X}_{i}^{n}\) is the input data in the first layer n. \({K}_{i,j}^{l}\) and \({b}_{j}^{l}\) denote the weight and bias vectors respectively. \({y}_{i}^{l}\) is the output of the jth feature map channel in the layer 1. f (·) is the nonlinear activation function that determines if the neuron is awake. The Rectified Linear Unit (ReLU) is used for neurons in the convolutional layer and is widely employed in pattern recognition. Its ability to accelerate the convergence and mitigate the vanishing gradient problem has been proven [35]. The ReLU can be represented as follows:

$$f\left( x \right) = {\text{max}}\left( {0,x} \right)$$
(10)
Fig. 2
figure 2

One-dimensional CNN structure diagram

(2) Pooling layer

In the second part, the pooling layer can compress the data volume and improve the computing efficiency after the convolutional layer, which is shown in Fig. 2(b). The type of pooling layer includes maximum pooling, average pooling, random pooling, and sum area pooling. Since the maximum pooling can get faster convergence and improve generalization, the maximum pooling is utilized in this paper, and its mathematical model can be represented as:

$$p_{i}^{l} = {\text{max}}\left( {w \cap y_{i}^{l} } \right)$$
(11)

where \({y}_{i}^{l}\) and \({p}_{i}^{l}\) represent the channel input and output of the pooling layer. W represents the pooling window, which can slide with a certain step. ∩ represents the overlap between the pooling window and the channel output of relu. max(·) represents the operation that takes the max of input data.

(3) Fully connected layer

The third part is composed of a fully connected layer and is used to further extract the features and connect with the output layer after the convolutional layer and pooling layer. Figure 2 shows the above-described topology structure of CNN. The neurons in the fully connected layer are interconnected according to the following calculation:

$$y_{i} = {\text{f}}\left( {w_{i} {*}p_{i} + b_{i} } \right)$$
(12)

where \({y}_{i}\) and \({p}_{i}\) represent the output and input of the fully connected layer respectively. \({w}_{i}\) represents the weight matrix, and \({b}_{i}\) is the bias. f(·) is the activation function in the fully connected layer.

The prediction process of the bearing is based on 1-D time-series signals. The training model is affected by an accurate addition of labels to the data [37]. Therefore, after the segmented processing of the HI, RULs are added to the original data as labels by using the constructed HI value.

The run-to-failure data sample of the bearing is considered as the training set{xt, yt}, and the HI values are taken as the xt of the training set; the percentage of the bearing RUL at t is applied as the corresponding label yt. Assuming that RUL1 is the whole life of the bearing, the RUL at time t is represented as RULt. The calculation of the RUL percentage is as shown in formula (13):

$$y_{t} = \frac{{RUL_{t} }}{{RUL_{1} }}$$
(13)

When t is 0, yt is 1, which indicates that the bearing has just begun to degenerate. When yt is 0, it means that the bearing has completely failed, and the RUL is 0 at the moment.

The HI divided into 10 time steps and the RUL label as the 11th time step are directly input into the 1D-CNN model, and the output of the model is the predicted RUL value, which reflects the input vibration signal moment of the bearing. The step size is set as 1 to train the CNN model. Assuming that the HI includes 2803 values, the dimensions of the input sequence it will split are 2793 × 10. In this paper, the convolution kernel of the CNN model is set to 64, the kernel size is 2, and the pooling layer size is 2.

3 Experimental study and result discussion

Two datasets are used to validate the proposed approach: FEMTO bearing datasets [31] and IMS dataset [32]. First, the sensitive features are extracted from the raw data, and then an HI is constructed to train the CNN model for RUL estimation. The details of the datasets and the experiments are revealed in the following subsection.

3.1 Case study on the FEMTO bearing datasets

3.1.1 Data description

The bearing testbed is shown in Fig. 3. The proposed method is tested using the public FEMTO-ST bearing dataset from the PHM Data Challenge held by the Institute of Electrical and Electronics Engineers (IEEE) in 2012, which covers the vibration acceleration data under three load conditions (Table 2): (1)1800 rpm and 4000 N, (2)1650 rpm and 4200 N, (3)1500 rpm, and 5000 N. Under each working condition, there are two run-to-failure bearings, which are considered as training bearings, and the remaining bearings are considered as test bearings. The data sampling frequency is 25.6 kHz; sampling is performed every 10 s, with each sampling lasting for 0.1 s; and 2560 vibration data points are recorded at a time. The test bench and test description are detailed in reference [31].

Fig. 3
figure 3

Data acquisition test platform of FEMTO bearing datasets

Table 2 FEMTO bearing datasets

To validate the proposed approach, Bearings 1_1, 2_1, and 3_1 are selected as the training sets, and 11 are randomly selected as testing sets, including 1_2 to 1_7, 2_2 to 2_5, and 3_3.

3.1.2 Feature extraction and selection

The horizontal vibration signals of 14 bearings are preprocessed and the original signals are filtered using the wavelet denoising method. To eliminate the influence of periodic changes and random fluctuations, the original features are processed with a moving average (MA) of a window 100; then, normalized and eliminated outliers are processed. Taking Bearing 1_2 as an example, the visualized image of the extracted and fused features is shown in Fig. 4.

Fig. 4
figure 4

Visualized image of initial features of bearing 1_2

The two indicators of the 23 features in Bearings 1_1 and 1_2 are shown in Fig. 5.

Fig. 5
figure 5

Monotonicity and trend index of bearings 1_1and1_2

The Cori index of the 14 features of Bearing 1_1 is calculated and sorted; the result is shown in Fig. 6. The threshold is set as 0.5. The features higher than the threshold are TF5, TF1, TF6, TF3, TF4, and RS2, which form the sensitive feature set.

Fig. 6
figure 6

Cori index screening of bearing 1_1

3.1.3 HI construction

The result of the fusion of the features in the sensitive feature set based on PCA is shown in Fig. 7, which is the HI of Bearing 1_1.

Fig. 7
figure 7

HI of bearing 1_1

The RMS features used to calculate the stop threshold of the rolling bearing test are compared to verify the effectiveness of the HI. As mentioned in the International ISO 2372 Mechanical Vibration Industry Standard, for medium-sized machinery, when the RMS of the vibration signal reaches 2.0–2.2 g, it indicates that the equipment is in a dangerous state [38]. Figure 8 shows the vibration signal RMS and HI of Bearings 1_1 and 1_2 in the whole life cycle. Figure 8(a) shows that RMS cannot effectively characterize the performance degradation state of the bearing. For the HI constructed using this method, as shown in Fig. 8(b), the HI value scale range of the two bearings is roughly the same throughout the life cycle. Relative to Bearing 2, whose RMS cannot characterize the performance degradation state, HI can sense its early degradation and characterize its degradation state.

Fig. 8
figure 8

RMS and HI of bearings 1_1 and1_2

Figure 9 shows the HI constructed by the sensitive feature selection method for seven bearings. When the HI value exceeds 0.5, it means that the test bearing is in the normal state. When the HI value ranges from 0.2 to 0.5, the test bearing is in the initial degradation stage. When it is less than 0.2, the test bearing is in the rapid degradation stage until complete failure.

Fig. 9
figure 9

Bearing HI under similar operating conditions

3.1.4 RUL prediction and comparison with different model

The proposed 1-D CNN model is used to predict the bearing RUL, and the vibration signal acquired at each sampling time of Bearing 1_1 is utilized to construct the HI. After the HI is constructed, it is applied to train the CNN model and predict RUL. As shown in Fig. 10, the abscissa is the current time and the ordinate is the RUL percentage. The figure shows that the RUL curve of the bearing predicted by the CNN model is close to the actual remaining life curve. When the observation time is approximately 10000S, the curve has evident fluctuations. Compared to the bearing HI, the results predicted by the feature fusion-based CNN method can perceive the initial degradation of the bearing. The prediction result fluctuates up and down in the true RUL at the end of the lifetime, which is close to the truth RUL as a whole, showing a downward trend.

Fig. 10
figure 10

Prediction effect map of bearing 1_1

To verify the effectiveness and superiority of RUL prediction acquired by the CNN model, the model is compared with the LSTM model and the Bi-LSTM model. LSTM is a special memory neural network that can consider the memory data of current and historical periods to achieve the prediction, which can extract time-series features after training and solve long-term dependence problems [39]. Similar to LSTM, Bi-LSTM is a corresponding variant of the new recursive network architecture; both of them function as solutions to long-term dependency problems [40].

Taking Bearings 1_3, 1_5, 1_7, 2_2, 2_4, and 3_3 as examples, the HIs are input into the trained CNN, LSTM, and Bi-LSTM to obtain the truth RUL fit curves and the predicted RUL values for each time point. This paper applies MA to smoothen the predicted curves to visually display the curve of the predicted RUL values acquired by the CNN model. The actual and predicted percentage of RUL curves are shown in Fig. 11. The result shows that the results predicted by the CNN model show a better fit to the real RUL compared to other models, which signifies that the proposed approach can appropriately predict the bearing RUL. The method is still effective for bearings’ RUL prediction under different working conditions.

Fig.11
figure 11

RUL prediction result: a bearing 1_3, b bearing1_5, c bearing1_7 d bearing2_2, e bearing2_4 and f bearing3_3

To describe the prediction effect of CNN and LSTM models more accurately, the models are evaluated using mean absolute error (MAE), root mean squared error (RMSE), and maximum error (MAX), which are calculated as follows:

$$MAE = \left( {\mathop \sum \limits_{i = 1}^{n} \left| {p_{i} - y_{i} } \right|} \right)/n$$
(14)
$$RMSE = \sqrt {\frac{1}{n}\sum\nolimits_{{i = 1}}^{n} {\left( {p_{i} - y_{i} } \right)^{2} } }$$
(15)
$$MAX = {\text{max}}\left( {\left| {p_{i} - y_{i} } \right|} \right)$$
(16)

The approach proposed to predict RUL is compared with other methods. To indicate the advantage of the proposed feature-fusion HI at RUL prediction, this paper uses the RMS of bearing vibration signals as the HI to predict RUL by CNN, LSTM, and Bi-LSTM. Taking seven bearings, as illustrated, the RUL prediction error results of LSTM-HI, Bi-LSTM-HI, CNN-RMS, LSTM-RMS, Bi-LSTM-RMS, and our method are shown in Fig. 12. The proposed method outperforms the other methods. Most cases in predicting 11 bearings’ RUL indicate that the proposed approach can achieve the lowest MAE, RMSE, and MAX errors. In particular, compared to the method using RMS, the feature fusion-based HI can appropriately predict the bearing RUL, and the prediction error is much lower than that of RUL prediction based on RMS. The lower the error, the higher is the prediction ability of the method.

Fig.12
figure 12

prediction errors of 11 bearings: a MAE, b RMSE, c MAX error

State-of-the-art comparison: In order to illustrate the advantage of the proposed method, the obtained results are compared with five other recent relevant methods on the FEMTO dataset. In particular, the sparse domain adaption network (SDAN) in [41], the method based on time empirical mode decomposition (EMD) and temporal convolutional networks (TCN) in [42], the gated dual attention unit (GDAU) in [43], the multiscale convolutional neural network (MSCNN) in [44], and the deep separable convolutional network (DSCN) in [45] are adopted and labeled as SDAN, EMD- TCN, GDAU, MSCNN and DSCN, respectively. The comparison results are indicated in Fig. 13.

Fig. 13
figure 13

Comparison results with recent methods

As shown in Fig. 13, the RMSE of the proposed approach is the smallest among the six methods, which means that the prediction accuracy of CNN-HI is greatly superior to other methods, and the proposed method achieves the best performance among all these methods.

3.2 Case study on the IMS bearing datasets

This dataset is provided by IMS, University of Cincinnati [46], and it is fully described on NASA’s website. IMS datasets are composed of three data subsets. The second subset is applied to verify the feature fusion-based CNN method, containing 984 files, each of which comprises 20,480 data samples with a sampling frequency of 20 kHz. And the vibration data was collected every 10 min by a NI 6062E DAQ Card.

Unlike the FEMTO dataset bearings, the bearings in this dataset experience longer and more complicated degradation processes, which increases the reliability of this dataset and the difficulty of RUL prediction. Hence, the proposed method predicts the RUL of IMS bearings. The subset includes four bearings, named IMS 2_1 to IMS 2_4. IMS 2_1 is selected as a training set, and IMS 2_2,2_3, and 2_4 are used as testing sets. The prediction results are shown in Fig. 14, and the prediction errors are shown in Fig. 15.

Fig. 14
figure 14

Percentage of life prediction of 3 bearings: a IMS 2_2, b IMS 2_3, c IMS 2_4

Fig. 15
figure 15

prediction errors of 3 bearings: a MAE, b RMSE, c MAX error

A comparison of the life prediction results of the FEMTO and IMS datasets indicates that.

  1. (1)

    The time- and frequency-domain similarity features constructed by this method consider the influence of time on the features and lay the foundation for constructing a suitable HI.

  2. (2)

    Based on the HI constructed in this paper using the CNN model to predict the early degradation of perceptible bearings, the HI constructed by this method can characterize the degradation state of bearings under similar operating conditions, which has a good generalization capability verified by test bearings with different failure modes and failure degrees.

  3. (3)

    The remaining life prediction method based on the CNN model has a better overall effect on predicting the bearing RUL than other neural network models and has a better prediction capability for 1-D time-series signals and relatively better prediction accuracy.

4 Conclusion

The construction of health factors directly impacts the accuracy of the remaining life prediction of bearings. In this study, an HI construction method based on the fusion of similarity features, combined with 1-D CNN, is proposed to predict the RUL of rolling bearings. First, the time- and frequency-domain similarity features are constructed. Second, the Cori index is used to filter out the features with high monotonicity and trend from the three-domain features to construct an HI that can effectively characterize the degradation trend of rolling bearings with universal applicability. Finally, the CNN prediction model is established to predict the RUL of rolling bearings under similar operating conditions, making maximum use of the effective information in the collected signals.