Introduction

Engineering maintenance and prognostics are crucial in most industries. Traditionally, breakdown corrective maintenance and scheduled preventive maintenance have been key strategies in maintenance processes (Azadeh et al. 2015). However, as physical systems have become more complex, existing methods no longer meet the industrial requirements for efficiency and reliability (Stringer et al. 2012). Thus, intelligent prognostic and health management (PHM) technologies have been developed to efficiently perform such diagnoses and management by monitoring sensor data (Vogl et al. 2019). The goal of intelligent PHM is to reduce maintenance costs and improve reliability by monitoring the system state (Xia et al. 2018).

With sufficient condition monitoring (CM) data, the PHM process can be utilized to estimate the remaining useful life (RUL) (Babu et al. 2016), and with monitored degradation data, the future performance of the system can be predicted in terms of the RUL. Efficient and accurate prognostic technologies help to make maintenance decisions in advance to avoid failures; thus, RUL estimation based on given historical sensor measurements has become one of the most important PHM activities.

The sensor measurements can also be used to other PHM processes such as identifying health conditions, anomaly detection or system diagnosis (Lei et al. 2008; Niu and Yang 2010). The integrated PHM system has been introduced to accomplish various PHM processes based on collected sensor data (Khan and Yairi 2018). However, in most cases, each task has been regarded as an independent task. Individual model is constructed for each task, thus the system complexity has become a potential risk to simultaneously consider other PHM process with RUL prediction (Khan and Yairi 2018). Moreover, there is a potential relation between other PHM processes and RUL prediction (Zaidan et al. 2015). Therefore, performing RUL predictions simultaneously with other process can improve the performance of the PHM system while reducing its complexity.

In this paper, we suggest a data-driven approach to simultaneously accomplish health condition identification and RUL estimation of a complex system. The health condition identification is to measure the current system conditions. If the current state can be specified, operable and committable time could be measured, and it can be helpful to evaluate the remaining time of a system from the current state (Patton et al. 2013). Thus, a multi-task learning (MTL) framework is suggested to infer the RUL of a current system, reflecting its correspondence to present health conditions.

MTL has been used to improve performance by utilizing the domain-specific information involved in the related tasks (Baxter 1997). The simultaneous learning of multiple related tasks has been shown, both theoretically and empirically, to improve performance as compared to the independent learning for each task (Girshick 2015). Traditional MTL based on a neural network structure is composed of two partial networks: a shared network and a task-specific network. A shared network uses the same hidden neurons over different tasks, while task-specific networks are made independent to separate the information of individual tasks. On connecting these networks, shared representation is developed concurrently across tasks. Thus, MTL can reduce the risk of overfitting and improve the single task performance of the original task.

In this study, one shared network and two task-specific networks are organized: the former to capture interdependencies between RUL estimation and health condition identification, and the latter to perform each task. By capturing the interdependencies of related integrated health management processes, the proposed MTL framework is expected to improve prognostic accuracy. In the proposed MTL model, convolution neural network (CNN) layers are employed as the basic structure of shared layers to extract global features from complex signals. Raw sensor data is input directly to the CNN-based MTL structure, and a multi-variate 1-D filter is utilized for feature extraction, reflecting the time varying relationship amongst multi-sensor data. The proposed framework was applied to the C-MAPSS dataset, which has a RUL label, and the system’s degradation information per cycle was recorded (Saxena et al. 2008). Our MTL framework is expected to contribute to managing and maintaining system conditions by improving RUL prediction performance.

The composition of this paper is as follows. Second section reviews previous studies on health management. Third and fourth sections explain the data and methodology used in this study, and fifth shows the results of empirical analysis. Finally, sixth section presents the conclusions as well as the contributions of the study.

Literature review

Data-driven approaches for the estimation of remaining useful life

In recent years, data-driven approaches have been used to discover the relationship between monitored condition data and RUL (Lee et al. 2014). One of the advantages of data-driven approaches to the RUL estimation in CBM is that extensive prior knowledge of the physical system is not required. Moreover, they can reflect intrinsic correlations and causalities of sensor measurements, thus proving their good prediction performance. Various approaches using data-driven algorithms aim to utilize these benefits. Riad et al. (2010) modeled a multi-layer perceptron (MLP) approach to estimate the RUL from an aircraft turbofan engine, demonstrating that the prediction performance of MLP is superior to that of a simple linear regression model. In another approach, Tian (2012) proposed an artificial neural network (ANN) based method to estimate the RUL of a physical instrument using monitored data.

Recently, deep neural network–based techniques, also known as deep learning approaches, have been applied to RUL estimation, as they are known for their ability to handle complex and high-dimensional systems (LeCun et al. 2015). Malhi et al. (2011) suggested an RNN-based approach for long-term prognostics of RUL. This model captured the long-term dependency of sequential vibration signals and could thus measure the RUL of rolling bearing products from real-time signals. To improve the drawbacks of traditional RNN such as gradient vanishing or exploding, Yuan et al. (2016) proposed a long-short term memory (LSTM) based framework, showing that the performance of LSTM was superior to RNN in predicting the RUL of an aero-engine. Several other studies have suggested RNN and LSTM based approaches for RUL estimation because the recurrent network structure is more effective than time-series data (Malhotra et al. 2016; Lim et al. 2016; Gugulothu et al. 2017; Guo et al. 2017; Yoon et al. 2017; Wu et al. 2018).

Because recurrent networks have complex computational burdens, CNNs, which are designed to extract features through weight sharing filters, were also utilized for RUL prediction. Babu et al. (2016) designed a two-dimensional (2D) CNN model to consider multivariate relationships of the sensor data; sequential monitoring data was used as the raw input data, and temporal patterns were also captured through the convolution layer. Li et al. (2018b) also designed a similar CNN approach, but by using a univariate 1-D CNN filter to reflect the sequential patterns of each sensor; the independent temporal features were extracted for each sensor and then concatenated in the output layer to estimate the RUL. However, none of these approaches utilized MTL.

Multi-task learning

MTL is an approach which simultaneously handles multiple tasks (Caruana 1997). One of the purposes of MTL is to improve single task generalization by reflecting the domain-specific information in the related tasks, which are also called auxiliary tasks (Ando and Zhang 2005). An additional aim is to combine the common knowledge of all tasks and simultaneously enhance their performance (Pan and Yang 2010). To achieve MTL objectives, it is necessary to learn related tasks; multiple tasks can be related in various ways, such as if the functions in each task work similarly in some manner or share a common aim in corresponding domains (Argyriou et al. 2007).

Implicit data augmentation, attention focusing, eavesdropping, and representation bias are some of the properties that allow MTL to perform better than independent learning (Caruana 1997; Ruder 2017; Zhang and Yang 2017). Essentially, deep learning utilizes a large network with numerous parameters, which requires a large amount of training data. One of the drawbacks of deep learning is that it is rendered ineffective by insufficient data samples (Marcus 2018). However, MTL utilizes the datasets from all tasks; it effectively increases the training sample size and extracts a more general representation from multiple tasks with different noise patterns (Ruder 2017). Additionally, it mainly updates model parameters from focused features that are commonly important to all the tasks, thus providing additional information on the relevance of features to other tasks. MTL also allows for most features to be effectively trained by different tasks (Caruana 1997); even if one task can only train certain features, the other features can be learned through other tasks. Finally, the model is trained using general representations from all related tasks, so a trained model can accept new related tasks in the same domain (Ruder 2017). This helps to improve the model’s adaptability to new tasks.

Problem settings

C-MAPSS data

The commercial modular aero-propulsion system simulation (C-MAPSS) dataset is derived from a turbofan engine simulation program developed by NASA (Saxena et al. 2008). The simulation program monitored the degradation of multiple aero-engines and collected multi-variate time-series sensor measurements from 21 sensors. The initial wear state and manufacturing variation of engine units are unknown but assumed to be healthy. The C-MAPPS dataset consists of four sub-datasets depending on whether the experimental setting has single or different operational conditions for each cycle and what kind of faults (i.e. FAN degradation or HPC degradation) are considered in the engine. Thus, each sub-dataset has run-to-failure sensor records collected under different operating conditions and fault modes, respectively.

Each sub-dataset includes a training set and a test set. In the training set, system degradation is observed as time progresses, and the last recorded data of each engine unit is considered as the fault declaration and the termination of the experiment. Multiple sensor measurements of each cycle are used as inputs for training samples, and all of them are used to predict the RUL. Records in the test set are pruned to stop before failure, and the aim of the C-MAPSS dataset is to predict the true RUL of the last record in the test set. This implies that only one sensor measurement of the last record per engine unit is considered as the testing sample, and the actual RUL value of it is used for verification. The C-MAPPS dataset information is shown in Table 1.

Table 1 Information of C-MAPPS dataset

Data preprocessing

Although multi-variate temporal data is included in the C-MAPSS dataset, some sensors have constant values which are not useful for the RUL estimation. Constant sensor values cannot be used significantly to predict the target RUL value. Therefore, these sensor values are eliminated, and only 14 out of 21 sensors are considered for the raw input data (sensor 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20 and 21).

For each of the sub-datasets in C-MAPSS, experimental scenarios can differ according to the operating conditions. FD001 and FD003 have single operating conditions, while FD002 and FD004 have six different operating conditions. Because of the conditional relationship between operating conditions and the monitored values, the actual meaning of sensor values can differ according to the operating conditions (Rabiei and Modarres 2013). Thus, condition-wise normalization is performed on the sensor data of FD002 and FD004 to adjust the scale of the sensor values in compliance with the operating conditions. The normalization has an observable effect on the sensor values in the RUL estimation. The normalization equation is as follows:

$$ Norm\left( {x^{{\left( {o,d} \right)}} } \right) = \frac{{x^{{\left( {o,d} \right)}} - \mu^{{\left( {o,d} \right)}} }}{{\sigma^{{\left( {o,d} \right)}} }} $$
(1)

where \( x^{{\left( {o,d} \right)}} \) denotes d-th sensor values of the o-th operating condition, and \( \mu^{{\left( {o,d} \right)}} \) and \( \sigma^{{\left( {o,d} \right)}} \) represent the mean and standard deviations of the sensor values for each operating condition. On the other hand, FD001 and FD003 have only one operating condition; thus, their normalization is performed by each sensor.

The normalized sensor values are used to estimate both the actual RUL value and the present health condition. Because the asset has a very healthy initial state, its RUL does not decrease at an early stage. Therefore, to obtain the RUL value in this study, a healthy state is maintained during the early stages by using a piece-wise linear degradation model that limits the maximum RUL value (Heimes 2008). Because the degradation is identified after the initial healthy period, this approach prevents overestimation of the RUL (Babu et al. 2016). We set the maximum RUL value to 125 as provided in Malhotra et al. (2016) and Li et al. (2018b). Meanwhile, there is no information related to the present health condition in the C-MAPPS dataset. Thus, three health condition labels are assumed: the first 30% of the samples are healthy, the last 30% are unhealthy and the remaining 40% are deteriorating (Yuan et al. 2016). In this manner, our dataset not only holds the true RUL value, but also the present health condition in every cycle. All functions in the suggested MTL are simultaneously learned from both targets.

Methodology

Architecture

In this study, we propose a deep architecture to effectively predict the RUL by simultaneously learning two tasks: RUL estimation and health condition prediction. To simultaneously estimate the RUL value and present health condition, an MTL model is constructed. The proposed model captures the interdependencies of both tasks offering supportive knowledge to the RUL estimation task to obtain improved prognostic performance.

Because sequential sensor measurements have temporal patterns, they can exhibit powerful performance in deep architecture-based MTL on recurrent networks. However, recurrent network–based approaches require relatively high computational time because, unlike CNN, the recurrent structure does not allow for increased parallelization (Vaswani et al. 2017). While MTL has the drawback of high computational complexity because it requires huge networks to reflect the combined information from multiple tasks, CNN effectively reduces computational time by utilizing parallelization. Therefore, CNN architecture was applied as the basic deep-learning structure of our proposed MTL.

Instead of using a recurrent architecture, a multi-variate 1D filter was applied to the CNN architecture to extract a better feature representation in terms of sequential multi-sensor measurements. The 1D filter can capture the underlying relationship between multiple sensors, and the temporal patterns of those sensors can be reflected by sliding the filter. The last feature map extracted by this filter was utilized as the input vector of separate networks for RUL estimation and health condition identification.

During the training process, the true RUL value and health condition were set as the learning objectives of our model, and all functions were updated by both complementary targets. In the testing process, the prediction values of both tasks were acquired, and the performance of our multi-task model was evaluated by comparing the RUL performance with other baselines. In the following sections, each process is described in detail.

Proposed multi-task network structure

MTL has two layers: shared layers and task-specific output layers. Usually, the bottom layers of the network are for sharing domain-specific information, whereas the top layers are for separating task-specific information. Shared layers share parameters of hidden layers, while task-specific output layers retain their own parameters. The former is to extract the general features of corresponding domains, while the latter is to focus on performing individual tasks. In this study, the shared layers are composed of a fully convolutional CNN layer and two separated fully connected layers (FCLs) of CNN and are regarded as the task-specific output layers, as shown in Fig. 1.

Fig. 1
figure 1

Proposed multi-task network structure, where \( N_{l} \) is the number of convolution layers and \( N_{f} \) is the number of kernel filter for each convolution layer

All parameters in the multi-tasking network were jointly optimized to simultaneously improve task generalization and individual model performance. The convolution parameters, which are shared network parameters, were updated to reflect the common knowledge of CBM. Meanwhile, the FCL parameters, which are task-specific network parameters, learned independent information corresponding to the RUL estimation and health condition identification, respectively. The loss function of each task affects both the individual task and the shared parameters.

The target RUL variable is an interval type, but the health condition is an ordinal type target with three levels, represented by 1 for healthy, 2 for deteriorating, and 3 for unhealthy (k = 1, 2, 3). To directly reflect this ordinal target into neural networks, we utilized \( 2 \) binary classifiers (Niu et al. 2016) with two dummy variables that represent the target vectors of healthy, deteriorating and unhealthy with {0,0}, {1,0} and {1,1}, respectively. The neural networks were trained to predict each binary target (Cheng et al. 2008). In the testing process, unseen samples could be ranked by scanning from the first to the second classifier. Each sample had two probabilities; if they were higher than a certain threshold, they were set to 1, or else to 0. The threshold value was set to 0.5. The sample ranks were judged based on the number of 1 s in all classifiers. Estimating RUL and classifying health condition respectively yield regression loss and ordinal regression loss, as follows,

$$ \begin{aligned} L_{cls} \left( {\hat{H}_{i}^{r} ,H_{i}^{r} } \right) & = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{r = 1}^{2} \left\{ {H_{i}^{r} \log \hat{H}_{i}^{r} + \left( {1 - H_{i}^{r} } \right)\log (1 - \hat{H}_{i}^{r} )} \right\} \\ L_{reg} \left( {\hat{R}_{i} ,R_{i} } \right) & = \frac{1}{N}\sqrt {\mathop \sum \limits_{i = 1}^{N} \left( {\hat{R}_{i} - R_{i} } \right)^{2} } \\ \end{aligned} $$
(2)

where N represents the number of data samples, and \( L_{reg} \) is the root mean squared error (RMSE) loss between target RUL R and predicted RUL \( \hat{R} \), while \( L_{cls} \) is the combined sigmoid loss information of two binary classifiers where \( H^{r} \) is the true health condition and \( \hat{H}^{r} \) is the predicted health condition for each sample. For multi-tasking learning, the loss function of each task affected both corresponding task layer and shared layer. Thus, parameters of task-specific layer were updated to improve performance of each task, while parameters of shared layer were changed to extract good general feature of CBM. Our proposed entire model is learned by the weighted sum of these two losses. Total loss is equally affected by both losses at a ratio of 1:1.

Multi-variate one-dimensional filter

Although the CNN was originally designed for image processing, it has become one of the most popular and successful deep learning methodologies in many research fields. In RUL estimation, CNN has been used in various studies and shown excellent prediction results (Babu et al. 2016; Li et al. 2018b). The CNN identifies the intrinsic patterns of input data through localized filters and spatial pooling (Krizhevsky et al. 2012). The convolution layer extracts abstract spatial features of the signals, while the pooling layer obtains significant values in local features.

As shown in Fig. 2, our raw data were pre-processed to a 2D matrix containing D raw sensor signals with an \( N_{w} \) time sequence. D represents 21 sensor numbers, and \( N_{w} \) indicates the time window size for the number of sensor measurement cycles to consider. Thus, the input data has two intrinsic properties: the potential relationship amongst multiple sensors and that in the time-series. Generally, CNNs extract the feature representation by sliding horizontally and vertically across the input map with convolution filter, essentially reflecting local correlations among pixels, assuming they are related to their neighbors. However, because the physical position of the sensors was unknown, there was no information about sensor correlations in our dataset. Thus, the size of one filter dimension was fixed to 21 sensor numbers. This captured the correlation between multiple sensors at once, leaving only the filter length \( N_{l} \), that is, the number of cycles to be considered by the filter, to be decided. By vertically sliding the filter, the temporal properties of the incorporated multi-sensor values could be captured simultaneously.

Fig. 2
figure 2

Structure of CNN layer with one-dimensional filter

Because this multi-variate one-dimensional filter can simultaneously reflect all the sensor values, the output value represents the combined values of multiple sensors. The convolution operation based on this filter is as follows:

$$ Z^{l} = \left[ {X^{l - 1} *k^{l}_{1} + b^{l}_{1} , \ldots ,X^{l - 1} *k^{l}_{{N_{f} }} + b^{l}_{{N_{f} }} } \right],\quad X^{l} = ELU\left( {Z^{l} } \right) $$
(3)

where \( * \) represents the convolution operator, \( l \) denotes the \( l \)-th convolution layer, \( X^{l - 1} \) and \( Z^{l} \) represent the input and output of \( l \)th convolution layer. \( ELU \) is the activation function for non-linear transformation between layers, \( k^{l} \) and \( b^{l} \) represent the filter kernel and bias, respectively, in the \( l \)-th convolution layer. The concatenation of all the output feature vectors was conducted for the next convolution layer; at the end of the convolution layer, the separated FCL were employed for each task. However, both FCLs had the same input feature vector from a shared network. The feature maps from the filter kernels of the last convolution layer were flattened and concatenated as an input feature vector of each FCL. Following this, the dropout technique was utilized to address the overfitting problem, and the output neurons for ordinal classification and regression, respectively, were attached at the end of both FCLs for RUL estimation and health condition identification.

Evaluation metrics

In this study, a multitasking model for RUL prediction and health condition identification is proposed. The performance of the RUL prediction is basically measured by RMSE metric. In addition, a scoring function was also used as another evaluation metric for RUL estimation. The following scoring function introduced in the PHM 2008 challenge has been used in many studies (Babu et al. 2016; Zhang et al. 2017; Li et al. 2018b):

$$ L_{score} = \left\{ {\begin{array}{*{20}c} {\mathop \sum \limits_{i = 1}^{N} \left( {e^{{ - \frac{{\left( {\hat{R}_{i} - R_{i} } \right)}}{13}}} - 1} \right) \quad for\,(\hat{R}_{i} - R_{i} ) < 0} \\ {\mathop \sum \limits_{i = 1}^{N} \left( {e^{{\frac{{(\hat{R}_{i} - R_{i} )}}{10}}} - 1} \right)\quad for\,(\hat{R}_{i} - R_{i} ) \ge 0} \\ \end{array} } \right. $$
(4)

Meanwhile, we evaluate the health condition identification using zero–one error. Zero–one error is the evaluation metric measuring the percentage of wrong assignments of ordinal categories (Cheng et al. 2008). However, zero–one error does not take into account prediction performance of each class. Thus, the precision, recall and f1 value are additionally considered for evaluating performance of health condition identification. The measures are defined as follows:

$$ \begin{aligned} Precision_{i} & = \frac{{True\,positive_{i} }}{{True \,positive_{i} + False\,positive_{i} }} \\ & Recall_{i} = \frac{{True\,positive_{i} }}{{True\,positive_{i} + False\,neagtive_{i} }} \\ & F1 \,score_{i} = \frac{{2*Precision_{i} *Recall_{i} }}{{Precision_{i} + Recall_{i} }} \\ \end{aligned} $$
(5)

where i is one of the health condition classes.

Experimental results

Experiment setting

In this study, the proposed multi-task framework utilizes the CNN architecture. A pre-processed matrix format of sensor measurements was used as the input data. The matrix was composed of sequential 21 sensor signals with \( N_{w} \) time window size; at each time step, the matrix was scanned by multi-variate one-dimensional filters of length \( N_{l} \), and the filters extracted high-level feature vectors capturing the temporal patterns of multi-signals.

Our CNN architecture consisted of four convolution layers and two FCLs. The number of kernel filters \( N_{f} \) was different for each convolution layer, while the two FCLs that serve as output layers for RUL estimation and healthy state identification had the same number of neurons \( N_{o} \). The dropout technique was imposed on each output layer to reduce the overfitting problem.

The activation function “exponential linear unit” \( \left( {ELU} \right) \) was used to perform non-linear transformation on the high-level latent vector. The results of the transformation were used as the inputs for the next layer. The \( ELU \) function is family of rectified linear units (\( ReLU \)) which speeds up learning and obtains a high performance result. ReLU has a linear shape in the positive arguments, alleviating the gradient vanishing problem (Nair and Hinton 2010). However, in the negative arguments, the gradient of the \( ReLU \) function is zero. The weights cannot be updated once they have negative values. Meanwhile, the \( ELU \) function decreases the lower-bound threshold to − 1 (non-zero) and approximates to an exponential function (Clevert et al. 2015). This allows the \( ELU \) function to have negative values, which drives sped-up learning.

To find the optimal solution, gradient-descent and back-propagation were adopted in our framework to update all the parameters in the network. For stochastic learning, the Adam optimizer and mini-batch technique were used. The mini-batch size \( N_{b} \) was determined to be 256, and an optimization technique based on the loss function was performed on each batch sample. The total number of training epochs \( N_{e} \) was 500, and early stopping was implemented for efficient learning. The learning rate was 0.001 for the first 250 epochs, after which it was 0.0001. All the hyperparameter values, set through trial and error, are presented in Table 2.

Table 2 Description of hyperparameter values

Baselines with hyper-parameter setting

The proposed multi-tasking CNN model (MT-CNN) is able to provide an accurate RUL estimation. To demonstrate the effectiveness of simultaneous learning, we first compared our MT-CNN to single-task CNN (ST-CNN), which is applied only for RUL estimation. Then, four regression-based architectures were used as baselines: deep neural network (DNN), support vector regression (SVR), extreme gradient boosting algorithm (XGBoost), recurrent neural network (RNN) and LSTM. Additionally, eight different methods in the prognostic domain were compared with our model, such as the RNN-based approach (Malhotra et al. 2016; Lim et al. 2016; Yoon et al. 2017), CNN-based approach (Babu et al. 2016; Li et al. 2018a, b), and ensemble-based approach (Zhang et al. 2017).

DNN

The ANN is a basic neural network technique that reflects the nonlinear relationship between the variables (Rosenblatt 1958). These complex nonlinear relationships can be considered by adding deeper layers. Five hidden layers were constructed in this study; 16, 32, 64, 128, and 256 are defined as the number of neurons in each hidden layer, respectively. To avoid the overfitting problem, a dropout technique was employed, with a dropout rate of 0.5.

XGBoost

The XGBoost is an ensemble tree learning algorithm with a parallel decision tree structure with different weights (Chen and Guestrin 2016). A classification and regression tree (CART) is used as a basic tree model in XGBoost, and the tree weights are updated by a gradient boosting algorithm. For the tree structure in this study, a maximum of 1000 leaves and a maximum individual tree depth 100 were used. A learning rate of 0.06 and an iteration count of 300 were also applied to update the tree weights.

RNN

The RNN has a feedback connection from input to output and can thus handle sequential information (Williams and Zipser 1989). Because the sensor value of C-MAPSS dataset has the sequence, RNN can have superiority to perform time-series RUL prediction task. In this study, an RNN model with 5 recurrent layers was organized to have a structural depth that was identical to our proposed multi-task model. An Adam optimizer, a learning rate of 0.01 and 500 epochs were adopted to obtain optimal parameter values.

LSTM

Because RNN has a vanishing or increasing gradient tendency during back propagation, it has difficulty of capturing long-term dependencies (Mikolov et al. 2010). In LSTM, a gate structure is employed, which helps to weaken gradient vanishing or increasing tendencies (Hochreiter and Schmidhuber 1997). Thus, LSTM is regarded as an alternative to an RNN. Here, an LSTM model was constructed with the same structure as an RNN to provide a fair comparison.

Comparison with baselines

The comparison results with other baselines are shown in Table 3. Our proposed method MT-CNN and ST-CNN demonstrated promising performance in both RMSE and scoring evaluations, and MT-CNN showed the best performance on all the subsets. This result indicates that simultaneous learning of related tasks is a more effective method than individual learning. In particular, our model is more effective in handling complex problems (i.e., FD002 and FD004) than simple ones (i.e., FD001 and FD003). Because the multitasking architecture requires sufficient network capabilities to reflect all the information from various tasks, our model achieved significant improvement in a more elaborate experimental environment.

Table 3 Comparison performance with other baselines

The RNN-based models show the next best performance in capturing temporal patterns in sequential sensor values. LSTM produces a better result than RNN because of its modified internal structure. However, we used the basic structure of the recurrent layer, and variations of them have been further optimized in several studies (Malhotra et al. 2016; Lim et al. 2016; Yoon et al. 2017). The XGBoost is not designed to consider temporal patterns, and it thus demonstrates the least efficient performance despite its reputation of powerful performance.

For the health condition identification, the zero–one error and precision, recall and f1 score are applied to measure the performance of another task. In this study, we have three categories for health condition. If we select one of them randomly, the performance of random selection might be 0.33. As shown in Table 4, our proposed model shows twice as much zero–one error performance as random selection. From the zero–one error, it is confirmed that our proposed model has the reasonable performance of auxiliary task.

Table 4 Performance of health condition identification

Additionally, in the terms of the precision and recall results, our proposed model has the low precision and high recall, high precision and low recall, and high precision and high recall at the healthy, deteriorating and unhealthy state, respectively. The results of healthy and deteriorating state indicate that our model predict more healthy state and less deteriorating state than actual number of them. Since the sensor measurements of the end periods have a characteristic pattern rather than start periods, there are only few characteristic sensor patterns that distinguish between both states. This is because there are only few characteristic sensor patterns that distinguish between both states. However, from the results of high precision and recall of unhealthy state, our proposed model specializes in capturing characteristic patterns of the unhealthy state. Diagnosing near-failure condition is also helpful in estimating RUL which predicts when failures occur. Thus, our health condition identification task will eventually help to improve the performance of RUL prediction.

Comparison with related works

The comparison results with previous studies that evaluated their framework on the C-MAPSS dataset are presented. Table 5 summaries the experiment result of latest research on the four sub-datasets of C-MAPSS (Malhotra et al. 2016; Lim et al. 2016; Yoon et al. 2017; Babu et al. 2016; Li et al. 2018a, b; Zhang et al. 2017). They applied various neural network architectures including recurrent, convolution and ensemble networks.

Table 5 Compared result with related works

Our proposed multitasking model demonstrates outstanding performance compared to other related works. The CNN-based approach with univariate one-dimensional filter produces very promising results, but unlike our model, it only considers RUL information in the learning process. With the exception of this model, most time-dependent models have better results than others because of their advantage of capturing the temporal pattern of sequential sensor measurements. However, recurrent network-based approaches require relatively high computational time to be trained (Vaswani et al. 2017). Since RNN is needed to be processed sequentially, parallel processing of GPU cannot be adopted to the computation (Bradbury et al. 2016). Hence, in integrated health management systems, our proposed multitasking architecture has great potential in terms of both performance and efficiency in making RUL estimations.

Prediction power of unit-level

The purpose of our study is to increase prognostic performance for each engine unit. Even if the RMSE is very small for the total RUL, the difference between the predicted and actual RUL values could be large for some units. To demonstrate the forecasting power for the engine unit level, the prediction results of testing engine units in the FD002 subset are presented in Fig. 3. Among the units, four testing units with the longest cycles are selected.

Fig. 3
figure 3

Unit-level results of RUL predictions for testing units

It can be observed that the predicted values in every cycle are close to actual RUL values. Specifically, the difference between actual and predicted values tends to be small when the cycles are close to the last record. This shows that the sensor measurements of the end periods have characteristic pattern. In particular, as proven in performance of health condition identification, our model has good ability to differentiate the last period from normal ones. A good evaluation in the last period can effectively save maintenance costs and enhance system reliability. This is the reason for the superior performance of our MTL framework as compared to independent learning, which distinguishes unique properties in healthy and unhealthy conditions. Our MTL framework can effectively reflect the characteristics of unhealthy conditions through healthy condition identification tasks.

Conclusion

In most industries, achieving good prognostic performance has great benefits in making maintenance decisions. In this paper, a CNN-based MTL framework was proposed to accurately estimate RUL by simultaneous learning during health state identification. The health state labeling process was carried out before our MTL process. Health information was combined to establish an MTL model for RUL estimation, where interdependencies of both tasks were considered via a network for RUL estimation by using general feature extracted in the shared network.

The proposed method was evaluated with the C-MAPSS dataset, and the prediction results on each sub-dataset was obtained and compared with those from methods in related literatures. Because the health state identification network could capture distinct characters of sensor measurements very close to health and failure, our model showed outstanding performance among various single task models. Our CNN architecture–based model achieved good prediction accuracy, and it has the additional advantage of improved computational complexity as compared to the recurrent neural architecture. Thus, the proposed framework shows promise for effective maintenance decision-making.

Specifically, various supportive techniques were employed in each part of our framework for effective and accurate prediction. Sensor selection and condition-wise normalization were applied to refine data. Multi-tasking learning architecture was exploited to capture salient patterns of both tasks from a complex system. Here, a multivariate one-dimensional filter was established to consider temporal properties among various sensor measurements. A dropout technique and ELU function were used to avoid the overfitting problem and promote effective learning, respectively.

In future work, the proposed method will be extended to solve multi-objective problems in integrated health management systems. In addition, a state-of-the-art shallow network can be employed to reduce the number of parameters and accelerate computational speed. We anticipate further studies addressing such limitations in the future.