Keywords

1 Introduction

Remaining useful life (RUL) prediction is one of the key techniques for prognostics and health management (PHM), which can estimate the remaining time before damage develops beyond the failure threshold, thus avoiding unplanned downtime and improving the safety, reliability and durability of machinery [1, 2]. Generally, RUL prediction methods are developed based on first-principles, degradation mechanisms, or artificial intelligence (AI) techniques. With the wide application of Industrial Internet of Things (IoT), AI-based prediction methods have received more attention in RUL prediction of machinery than other methods because they can automatically mine useful degradation information from monitoring data and infer correlation and causation [3].

Existing AI-based prediction methods can be categorized into two main groups: shallow learning-based methods and deep learning-based methods [4]. Shallow learning-based methods are built on some traditional machine learning models. These models have limited representational learning capabilities and thus require a priori knowledge or domain expertise to preprocess the raw monitoring data from the machine. However, in the era of industrial IoT, where the amount of monitoring data grows exponentially over time, the process of sensitive feature extraction becomes increasingly difficult. To address the massive monitoring data and avoid labor-intensive feature extraction, deep learning, a special class of machine learning, is gradually being applied to RUL prediction of machinery [5].

Deep learning-based methods are constructed by stacking multiple nonlinear processing layers, which represent a stronger learning capability compared to shallow learning-based methods. As a result, deep learning-based methods get rid of the process of sensitive feature extraction and are able to directly use raw monitoring data as inputs for model training and RUL prediction. In the past few years, scholars have employed different types of deep learning models to accomplish the RUL prediction task. Wang et al. [6] constructed a multi-scale convolutional attention network to solve the problem of multi-sensor information fusion and applied it into RUL prediction of cutting tools. Ma et al. [5] proposed a convolution-based long short-term memory network for RUL prediction, in which the convolutional structure embedded in the LSTM is able to capture long-term dependencies while extracting features in the time-frequency domain. Wu et al. [7] built a LSTM autoencoder considering various degradation trends and adopted it to estimate the bearing RUL. Although deep learning-based prediction methods have achieved some state-of-the-art results, few studies have considered the prediction of machinery life under different operating conditions. In some industrial application scenarios, the operating conditions of the actual data often differ greatly from those of the training data, which greatly limits the prediction performance of the prediction methods.

Based on the above analysis, this paper proposes a new network called bidirectional temporal convolutional network (BDTCN), where the core layer of the network: the bidirectional temporal convolutional layer can be used for elaborate information mining and time series modeling along the forward and reverse directions to capture the i and efficiently capture the variation of operating conditions. In addition, to enable the network predict RUL using data from different operating conditions, this paper develops a new training strategy, namely the adversarial adaptation training strategy, which can help the network to extract operating condition-invariant degradation features for RUL prediction.

The remaining sections of this paper are summarized as follows. Section 2 describes in detail the proposed BDTCN and the adversarial adaptation training strategy for mechanical RUL prediction under different operating conditions. Section 3 validates the effectiveness of the proposed method by performing accelerated degradation experiments on rolling bearings. Finally, conclusions are given in Sect. 4.

Fig. 1.
figure 1

The network structure of BDTCN

2 The BDTCN and the Adversarial Adaptation Training Strategy

In this section, a bidirectional temporal convolutional layer is first constructed to fully exploit the machine degradation information and efficiently capture the variation of operating conditions in RUL prediction. Then, an adversarial adaptation training strategy is developed to essentially enhance the robustness of BDTCN's RUL prediction under different operating conditions.

2.1 Proposed BDTCN

The network structure of BDTCN is shown in Fig. 1, which consists of a representation learning sub-network and a RUL prediction sub-network. The representation learning sub-network consists of D bidirectional temporal convolutional layers and D maximum pooling layers stacked alternately, after which the RUL prediction sub-network uses two fully connected layers for RUL prediction. Specifically, the proposed BDTCN embeds temporal information into the network inputs using a time window with a length of S. These input data have interdependencies on time scales, and such dependencies are crucial for accurate RUL prediction under different operating conditions. Therefore, the BDTCN constructs a new layer of the network, i.e., the bidirectional temporal convolutional layer to enhance the temporal information mining capability of the prognostics network.

Fig. 2.
figure 2

Architecture of bidirectional temporal convolutional layer

As shown in Fig. 2, The bidirectional temporal convolutional layer contains a forward temporal convolution operation and a reverse temporal convolution operation. The process of bidirectional temporal convolution operation can be formulated as follows:

$$ \overrightarrow {z} _{{n,t}}^{l} = \sigma _{r} \left( {\overrightarrow {k} _{n}^{l} * _{d} x_{t}^{{l - 1}} + \overrightarrow {b} _{n}^{l} } \right) = \sigma _{r} \left( {\sum\limits_{{i = 0}}^{{M - 1}} {\overrightarrow {k} _{{n,i}}^{l} } \cdot x_{{t - d \times i}}^{{l - 1}} + \overrightarrow {b} _{n}^{l} } \right) $$
(1)
$$ \overleftarrow {z} _{{n,t}}^{l} = \sigma _{r} \left( {\overleftarrow {k} _{n}^{l} * _{d} x_{t}^{{l - 1}} + \overleftarrow {b} _{n}^{l} } \right) = \sigma _{r} \left( {\sum\limits_{{i = 0}}^{{M - 1}} {\overleftarrow {k} _{{n,i}}^{l} } \cdot x_{{t + d \times i}}^{{l - 1}} + \overleftarrow {b} _{n}^{l} } \right) $$
(2)

where \(\sigma_{{\text{r}}} \left( \cdot \right)\) is nonlinear activation function, \({\varvec{x}}_{t}^{l - 1}\) is the input tensor of l-th bi-directional temporal convolution layer at time t, \(\vec{z}_{n,t}^{l}\) is the output tensor of the forward temporal convolution at time t, \(\user2{\mathop{k}\limits^{\leftarrow} }_{n}^{l} *_{d}\) is the n-th reverse temporal convolution kernel with an expansion of d, \(\vec{b}_{n}^{l}\) and \( \overleftarrow {b} _{n}^{l} \) are the bias term. In particular, the number of forward and reverse temporal convolution kernels of the l-th bidirectional temporal convolution layer are both \(2^{{\left( {l - 1} \right)}} N\) and the expansion rate are both \(2^{{\left( {l - 1} \right)}}\). After that, the outputs of these two convolutions are fed separately to the maximum pooling layer, which is formulated as follows:

$$ {\vec{\mathbf{u}}}_{n}^{l} = {\text{pool}} \left( {{\vec{\mathbf{z}}}_{n}^{l} ,p,s} \right) $$
(3)
$$ {\mathbf{\mathop{u}\limits^{\leftarrow} }}_{n}^{l} = {\mathbf{\mathop{z}\limits^{\leftarrow} }}_{n}^{l} $$
(4)

where p is the pooling size and s is the pooling stride. Note that in the example of Fig. 2, the number of convolution kernels is set to 1 for ease of observation and no maximum pooling layer is added.

2.2 Adversarial Adaptation Training for BDTCN

This section proposes the adversarial adaptation training strategy as shown in Fig. 3, which takes into account the variability between monitoring data with different working conditions, and adapts and fine-tunes the BDTCN to the operating conditions by generating adversarial training. The proposed adversarial adaptation strategy first initializes the BDTCN using the source operating condition dataset, and the initialized representation learning sub-networks \(F\left( \cdot \right)\) and RUL prediction sub-networks \(R\left( \cdot \right)\) can be obtained. Then, the generative adversarial network is constructed to obtain operating condition-invariant degradation features, specifically, the generator consists of the representation learning sub-network and the discriminator consists of three fully connected layers. The generator and the discriminator are independent of each other and optimize their respective network parameters alternately by means of adversarial training, and finally reach the Nash equilibrium, i.e., the generator outputs operating condition-invariant representations, so that the discriminator cannot judge its operating condition. And its objective function is defined as follows:

Fig. 3.
figure 3

An illustration of adversarial adaptation training

$$ \mathop {\min }\limits_{{F^{*} }} \mathop {\max }\limits_{D} {\mathcal{L}}\left( {D,F^{*} } \right) = {\mathbb{E}}_{{{\varvec{X}}_{t} \sim p_{t} \left( {{\varvec{X}}_{t} } \right)}} \left[ {\log \left( {D\left( {F\left( {{\varvec{X}}_{t} } \right)} \right)} \right)} \right] + {\mathbb{E}}_{{{\varvec{X}}_{s} \sim p_{s} \left( {{\varvec{X}}_{s} } \right)}} \left[ {\log \left( {1 - D\left( {F^{*} \left( {{\varvec{X}}_{s} } \right)} \right)} \right)} \right] $$
(5)

where \({\varvec{X}}_{s}\) is the samples from the source operating condition dataset, \({\varvec{X}}_{t}\) is the samples from the target operating condition dataset, \(F^{*} \left( \cdot \right)\) is the generator after updating network parameters. The discriminator \(D\left( \cdot \right)\) is fixed when the parameters of the generator \(F\left( \cdot \right)\) are updated, and the objective function of the generator can be formulated as follows:

$$ \mathop {\min }\limits_{{F^{*} }} {\mathcal{L}}\left( {F^{*} } \right) = {\mathbb{E}}_{{{\varvec{X}}_{s} \sim p_{s} \left( {{\varvec{X}}_{s} } \right)}} \left[ {\log \left( {1 - D\left( {F^{*} \left( {{\varvec{X}}_{s} } \right)} \right)} \right)} \right] $$
(6)

Similarly, the objective function of the discriminator can be expressed as follows:

$$ \mathop {\max }\limits_{D} {\mathcal{L}}\left( D \right) = {\mathbb{E}}_{{{\varvec{X}}_{t} \sim p_{t} \left( {{\varvec{X}}_{t} } \right)}} \left[ {\log \left( {D\left( {F\left( {{\varvec{X}}_{t} } \right)} \right)} \right)} \right] + {\mathbb{E}}_{{{\varvec{X}}_{s} \sim p_{s} \left( {{\varvec{X}}_{s} } \right)}} \left[ {\log \left( {1 - D\left( {F^{*} \left( {{\varvec{X}}_{s} } \right)} \right)} \right)} \right] $$
(7)

The operating condition-invariant representations can be extracted by the above adversarial domain adaptation training strategy, thus realizing the RUL prediction under different operating conditions data.

Finally, the RUL prediction subnetwork is fine-tuned using the source operating condition dataset, noting that the parameters of the representation learning sub-network \(F^{*} \left( \cdot \right)\) are fixed in this process and only the parameters of the RUL prediction sub-network \(R\left( \cdot \right)\) are updated.

3 Experimental Results and Analysis

3.1 Data Description

To validate the effectiveness of the proposed method, accelerated degradation tests were performed on a rolling bearing testbed as shown in Fig. 4. The tested bearings were driven by an alternating current (AC) motor and the radial force was applied by a hydraulic loading device. The run-to-failure data of the tested bearings were collected by horizontal and vertical accelerometers with a sampling frequency of 25.6 kHz, a sampling interval of 12 s and a sampling time of 1.28 s each interval. As shown in Table 1, a total of 14 LDK UER204 ball bearings were subjected to accelerated degradation under three different operating conditions. The tested bearings under the first operating condition were used as the source-domain, and the tested bearings under the last two operating conditions were used as the target-domain data. In addition, the first two tested bearings in each target condition are used in adversarial domain adaptation, and the last three ones are used as testing set.

Fig. 4.
figure 4

Rolling bearing testbed

Table 1. Tested bearings for RUL prediction

3.2 Ablation Experiments

In order to validate the effectiveness of the BDTCN and adversarial adaptation training strategy, ablation experiments were performed in this section. Method 1 uses a standard convolutional layer instead of a bidirectional temporal convolutional layer and does not use the adversarial adaptation training strategy. Method 2 constructed by standard convolutional layer and trained by adversarial adaptation strategy. Method 3 retains the bidirectional temporal convolutional layer but does not use an adversarial adaptation training strategy. The performance of the compared methods is shown in Table 2, according to which the following analysis can be done: The RMSE of Method 1 is the largest among all compared methods, which indicates that if the network obtained by directly using the data trained from a single working condition is predicted under the new working condition, the network will be difficult to obtain good prediction results, which leads to a large error between the predicted value and the true value, whereas the BDTCN uses a bidirectional temporal convolutional layer for feature extraction and is trained using the antithetic adaptation strategy, so its RMSE is minimized among all the compared networks.

3.3 Comparison with Existing Methods

This section uses three existing prognostics methods for RUL prediction. Method 4 [8] is constructed based on sparse self-coding. Method 5 [9] and Method 6 [10] are built on top of convolutional neural networks and convolutional long- and short-term memory networks, respectively. Table 2 summarizes the performance evaluation results of the proposed methods and the three existing migration prediction methods. From the table, it can be seen that the proposed method obtains the minimum RMSE value in the RUL of the tested bearings for each operating condition. This indicates that compared to the other three existing methods, the proposed BDTCN is able to obtain higher prediction accuracy and more stable prediction results under different operating conditions. Therefore, the prediction performance of the proposed method is better than the other three existing methods.

Table 2. Performance evaluation of compared methods

4 Conclusion

An anti-adaptive remaining lifetime prediction framework is proposed for RUL prediction under different operating conditions. First, a new network, named BDTCN, is proposed to extract the interdependencies of input data on time scales through forward and reverse convolution operations to capture key degradation features associated with operating conditions. Then, an anti-adaptive training strategy is developed to help the BDTCN further extract the operating condition invariant degradation features. The proposed framework is evaluated through ablation experiments and comparison with existing methods. The experimental results demonstrate the effectiveness and superiority of the framework in RUL prediction.