1 Introduction

Multivariate statistical techniques such as principal component analysis (PCA) (Harrou et al. 2013), partial least squares (PLS) (Yacoub and Macgregor 2004) and canonical variate analysis (CVA) (Ruiz-Cárcel et al. 2016) have been widely applied for the detection of abnormalities in large industrial systems. Multivariate subspace identification models based on PCA, PLS or CVA have attracted attention over the past decades because they can be utilized for process monitoring, modelling and system identification. The authors of (Juricek et al. 2005) demonstrated that system-identification models based on CVA outperform modelling models based on regression methods such as PLS. The authors of (Ruiz-Cárcel et al. 2015) demonstrated that monitoring methods based on CVA are more suitable for systems working under changing operating conditions compared to models based on PCA and PLS. The literature provides examples of extensive application of CVA for industrial process modelling and health monitoring. Li et al. (2018) developed a prediction method based on CVA and support vector machine for modelling of industrial reciprocating compressors. The authors of (Larimore et al. 1993) proposed a state-space method using canonical variable states for modelling linear and nonlinear time series. Negiz and Cinar (1997) used a CVA-based subspace-identification approach to describe a high-temperature short-time milk-pasteurization process. CVA was utilized in (Li et al. 2018) to predict performance deterioration and estimate the behaviour of a system under faulty operating conditions. The authors illustrated the performance of the proposed method in a large-scale 3-phase flow facility. Conventional multivariate subspace-identification approaches based on PCA or PLS are based on the assumption that the process variables are linearly correlated and are independent and identically distributed (IID) (Choi et al. 2006). The requirement that process variables be IID and linear tends to limit the scope of many subspace-identification methods to linear processes operating under steady state conditions. Occasionally, problems associated with the effectiveness of the modelling tools can arise when the underlying assumptions are violated; for instance, the presence of nonlinear distortions, time-dependency, system dynamics and varying operating conditions. Therefore, it is necessary to develop adaptive subspace-identification approaches for systems in which variations in the mode of operation and changes in the system dynamics are common. A number of recursive monitoring methods have been proposed to address these limitations. An extension to the modelling approaches based on the conventional PCA method was proposed by Lane et al. in (2003). The authors illustrated the performance of the proposed recursive PCA model in a polymer film-manufacturing process. Choi et al. (Choi et al. 2006) developed an adaptive multivariate statistical process monitoring (MSPC) for the monitoring of dynamic processes where variations in operating conditions are incurred. The authors of (Lee and Lee 2008) proposed a recursive state-space model based on CVA. In that study, the norm of the difference between consecutive measurements was used to adjust forgetting factors, and the calculation of the optimal values of the minimum and maximum forgetting factors was not detailed.

In this paper, we develop an adaptive monitoring tool based on CVA for the modelling of time-varying processes. We explore the ability of adaptive CVA to predict the behaviour of industrial rotating machines under slowly evolving faulty conditions. To obtain an accurate estimate of system outputs, forgetting factors calculated based on the residual between the model outputs and actual measurements are adopted to update the covariance and cross-covariance matrices of the system. The proposed method is validated on industrial data captured from an operational gas compressor.

2 Methodology

Given system input time-series \( u_{t} \) and output time-series \( y_{t} \), a linear state-space model can be built as follows (Qin 2006):

$$ x_{t + 1} = Bx_{t} + Cu_{t} + Ke_{t} $$
(1)
$$ y_{t} = Dx_{t} + Eu_{t} + e_{t} $$
(2)

where \( u_{t} \), \( y_{t} \) and \( x_{t} \) are system inputs, system outputs and state vectors; \( B, C, D,E \) and \( K \) are model coefficient matrices; and \( e_{t} \) is zero-mean and normally distributed independent white noise.

The objective of CVA is to maximize the correlation of two sets of variables (Russell et al. 2000). To generate two data matrices from the measurements, the measurement vector is expanded at each sampling time by including \( a \), the number of previous samples, and \( b \), the number of future samples, to construct the past and future sample vectors \( z_{a, t} \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right) \cdot a}} \) and \( u_{b, t} \in {\mathcal{R}}^{{n_{u} \cdot b}} \)(\( n_{y} \) and \( n_{u} \) are the number of output variables and input variables).

$$ z_{a, t} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\begin{array}{*{20}c} {y_{t - 1} } \\ {u_{t - 1} } \\ {y_{t - 2} } \\ \end{array} } \\ {u_{t - 2} } \\ \vdots \\ \end{array} } \\ {y_{t - a} } \\ {u_{t - a} } \\ \end{array} } \right] \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right) \cdot a}} $$
(3)
$$ u_{b, t} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {u_{t} } \\ {u_{t + 1} } \\ \end{array} } \\ \vdots \\ {u_{t + b - 1} } \\ \end{array} } \right] \in {\mathcal{R}}^{{n_{u} \cdot b}} $$
(4)

The observations can be expanded at each sampling time \( t \) by including \( a + 1 \) observations to form the extended past vectors \( z_{a + 1, t} \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right) \cdot \left( {a + 1} \right)}} \):

$$ z_{a + 1, t} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\begin{array}{*{20}c} {y_{t - 1} } \\ {u_{t - 1} } \\ {y_{t - 2} } \\ \end{array} } \\ {u_{t - 2} } \\ \vdots \\ \end{array} } \\ {y_{t - a - 1} } \\ {u_{t - a - 1} } \\ \end{array} } \right] \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right) \cdot \left( {a + 1} \right)}} $$
(5)

To avoid the domination of variables with large absolute values, the past and future sample vectors are normalized to the zero-mean vectors \( \hat{z}_{a,t} \) and \( \hat{u}_{b,t} \). Then, the vectors \( \hat{z}_{a,t} \) and \( \hat{u}_{b,t} \) at different sampling times are rearranged to produce the reshaped matrices \( \hat{Z}_{a} \) and \( \widehat{U}_{b} \):

$$ \hat{Z}_{a} = \left[ {\hat{z}_{a,t + 1} ,\hat{z}_{a,t + 2} , \ldots ,\hat{z}_{a,t + N} } \right] \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right){\text{a}} \times N}} $$
(6)
$$ \widehat{U}_{b} = \left[ {\hat{z}_{b,t + 1} ,\hat{z}_{b,t + 2} , \ldots ,\hat{z}_{b,t + N} } \right] \in {\mathcal{R}}^{{n_{u} {\text{a}} \times N}} $$
(7)

where \( N = l - a - b + 1 \), and \( l \) represents the total number of samples for measurements \( y_{t} \). Cholesky decomposition is then applied to the past and future matrices \( \hat{Z}_{a} \) and \( \widehat{U}_{b} \) to configure a Hankel matrix \( {\mathcal{H}} \). To find the linear combination that maximizes the correlation of the two sets of variables, the truncated Hankel matrix \( {\mathcal{H}} \) is decomposed using singular value decomposition (SVD):

$$ {\mathcal{H}} = \varSigma_{b,b}^{ - 1/2} \varSigma_{b,a} \varSigma_{a,a}^{ - 1/2} = U\varSigma V^{T} $$
(8)

where \( \varSigma_{a,a} \) and \( \varSigma_{b,b} \) are the sample covariance matrices and \( \varSigma_{b,a} \) denotes the cross-covariance matrix of \( \hat{Z}_{a} \) and \( \widehat{U}_{b} \). \( \varSigma_{a,a} \), \( \varSigma_{b,b} \) and \( \varSigma_{b,a} \) are calculated as follows (Odiowei and Yi 2010):

$$ \varSigma_{a,a} = \hat{Z}_{a} \hat{Z}_{a}^{T} /\left( {N - 1} \right) $$
(9)
$$ \varSigma_{b,b} = \widehat{U}_{b} \widehat{U}_{b}^{T} /\left( {N - 1} \right) $$
(10)
$$ \varSigma_{b,a} = \widehat{U}_{b} \hat{Z}_{a}^{T} /\left( {N - 1} \right) $$
(11)

\( U \), \( V \) and \( \sum \) have the following form:

$$ U = \left[ {u_{1} ,u_{2} , \ldots ,u_{r} } \right] \in {\mathcal{R}}^{{n_{u} {\text{a}} \times n_{u} {\text{a}}}} $$
$$ V = \left[ {v_{1} ,v_{2} , \ldots ,v_{r} } \right] \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right){\text{a}} \times \left( {n_{y} + n_{u} } \right){\text{a}}}} $$
$$ \varSigma = \left[ {\begin{array}{*{20}c} {d_{1} } & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & {d_{r} } \\ \end{array} } \right] \in {\mathcal{R}}^{{n_{u} a \times \left( {n_{y} + n_{u} } \right){\text{a}}}} $$

The columns of \( U = \left[ {u_{1} ,u_{2} , \ldots ,u_{r} } \right] \) and the columns of \( V = \left[ {v_{1} ,v_{2} , \ldots ,v_{r} } \right] \) are called the left-singular and right-singular vectors of \( {\mathcal{H}} \). \( \varSigma \) is a diagonal matrix, and its diagonal elements are called singular values and depict the degree of correlation between the corresponding left-singular and right-singular vectors. The right-singular vectors in \( V \) corresponding to the largest \( q \) singular values are retained in the truncated matrix \( V_{q} = \left[ {v_{1} ,v_{2} , \ldots ,v_{q} } \right] \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right){\text{a}} \times q}} \). This matrix is used later to perform dimension reduction on the measured data.

With the truncated matrix \( V_{q} \), the \( \left( {n_{y} + n_{u} } \right) \) dimensional past vector \( \hat{Z}_{a} \in {\mathcal{R}}^{{\left( {n_{y} + n_{u} } \right){\text{a}} \times N}} \) is further converted into a reduced \( q \)-dimensional matrix \( \varPhi \in {\mathcal{R}}^{q \times N} \) (the columns of \( \varPhi \) are \( z_{t} \), which are called canonical state variates) by the following:

$$ \varPhi = \left[ {z_{t = 1} ,z_{t = 2} , \ldots ,z_{t = N} } \right] = K \cdot \hat{Z}_{a} = V_{q}^{T} \varSigma_{a,a}^{ - 1/2} \cdot \hat{Z}_{a} $$
(12)

where \( K = V_{q}^{T} \varSigma_{a,a}^{ - 1/2} \in {\mathcal{R}}^{{q \times \left( {n_{y} + n_{u} } \right){\text{a}}}} \) is the projection matrix that maps the past observations into the canonical variate space. In this investigation, the number of \( q \) is determined in the same way as the traditional CVA model. According to the literature (Odiowei and Yi 2010), if the number of retained states \( q \) is no less than the actual order of the system, we can substitute the state variates \( x_{t} \) with the canonical state variates \( z_{t} \). Therefore, the state variables are defined as a linear combination of the past measurement vector \( \hat{Z}_{a} \) (Lee and Lee 2008):

$$ \hat{x}_{t + 1} = \left[ {K\quad 0} \right]z_{a + 1, t} $$
(13)
$$ \hat{x}_{t} = \left[ {0\quad K} \right]z_{a + 1, t} $$
(14)

where \( 0 \in {\mathcal{R}}^{{q \times \left( {n_{y} + n_{u} } \right)}} \) with all zero entries. According to the literature (Shang et al. 2015), after the estimates of the state variates are calculated, the matrices \( B, C, D,E \) and \( K \) are calculated from the measurements through linear least-squares regression as follows:

$$ \left[ {D\quad E} \right] = Y_{{\left( {:,1:N - a - 1} \right)}} \left[ {\begin{array}{*{20}c} {\hat{X}_{{\left( {:,1:N - a - 1} \right)}} } \\ {U_{{\left( {:,1:N - a - 1} \right)}} } \\ \end{array} } \right]^{ + } $$
(15)
$$ \left[ {B\quad C\quad K} \right] = \hat{X}_{{\left( {:,2:N - a} \right)}} \left[ {\begin{array}{*{20}c} {\hat{X}_{{\left( {:,1:N - a - 1} \right)}} } \\ {\begin{array}{*{20}c} {U_{{\left( {:,1:N - a - 1} \right)}} } \\ {\hat{E}_{{\left( {:,1:N - a - 1} \right)}} } \\ \end{array} } \\ \end{array} } \right]^{ + } $$
(16)

where \( \hat{E} = Y - D\hat{X} - EU \).

Due to non-stationary process behaviour, many industrial processes have time-varying characteristics that may cause rapid changes in state variates over time. The sample covariance matrices \( \varSigma_{a,a} \) and \( \varSigma_{b,b} \) and the cross-covariance matrix \( \Sigma _{a,b} \) change according to the change in operating conditions. Constant covariance and cross-covariance matrices may not be able be fully capture the system dynamics. Therefore, the exponential weighted moving-average method is employed in this investigation to update the matrices \( \varSigma_{a,a} \), \( \varSigma_{b,b} \) and \( \varSigma_{b,a} \):

$$ \varSigma_{a,a\left( t \right)} = \left( {1 - \beta } \right)z_{a, t} z^{T}_{a, t} + \beta \varSigma_{{a,a\left( {t - 1} \right)}} $$
(17)
$$ \varSigma_{b,b\left( t \right)} = \left( {1 - \beta } \right)u_{b, t} u^{T}_{b, t} + \beta \varSigma_{{b,b\left( {t - 1} \right)}} $$
(18)
$$ \varSigma_{b,a\left( t \right)} = \left( {1 - \beta } \right)u_{b, t} z^{T}_{a, t} + \beta \varSigma_{{b,a\left( {t - 1} \right)}} $$
(19)

where \( \beta \) is the forgetting factor, which is calculated according to the Euclidean norm of the residual between the predicted model outputs and actual measurements. The initial values of \( \varSigma_{a,a\left( t \right)} \), \( \varSigma_{b,b\left( t \right)} \) and \( \varSigma_{b,a\left( t \right)} \) are determined by the traditional CVA model. Tracking time-varying parameters is an important problem in subspace modelling. A constant forgetting factor is not suitable for tracking time-varying parameters and therefore cannot fully reflect the dynamics of a process under nonstationary conditions (Leung and So 2005). Therefore, the forgetting factor must be changed according to the rate of process change to yield satisfactory predictive results in time-varying environments. In this investigation, the forgetting factor is adjusted based on the Euclidean norm of the residual between the predicted model outputs and the actual measurements as adopted in (Shang et al. 2015). When the forgetting factor is small, it gives more weight to present observations to reduce the impact of past observations on the current model. As the value of the forgetting factor approaches unity, it gives more weight to past measurements, thereby permitting long-term memory of the model. The forgetting factor used in this study is calculated as follows:

$$ \beta_{t} = Ae^{{ - \left( {\parallel e_{t - 1} \parallel } \right)/\sigma_{1} }} $$
(20)

where \( A \) is a constant. The empirical parameter selection procedure proposed by (Choi et al. 2006; Shang et al. 2015) is adopted in this study to determine the value of \( A \). Typically, a value between 0.999 and 0.9 is selected. \( \parallel e_{t - 1} \parallel \) denotes the Euclidean norm of the residual of the actual measurements and model outputs. The tuning parameter \( \sigma_{1} \) controls the sensitivity of the model to prediction errors. The larger \( \sigma_{1} \) is, the less sensitive the model is to prediction error. The value of the forgetting factor \( \beta_{t} \) is adjusted at every time instance when new measurements are available.

After the forgetting factor is determined, weighted recursive least squares (WRLS) with adaptive forgetting factor (Leung and So 2005; Turksoy et al. 2014) can be used to update model coefficient matrices \( B, C, D,E \) and \( K \). The system described by Eqs. 1 and 2 is rewritten as follows:

$$ y_{t} = \varTheta_{t}^{y} \emptyset_{t} + e_{t} $$
(21)
$$ \hat{x}_{t + 1} = \varTheta_{t}^{x} \varPsi_{t} $$
(22)

where \( \varTheta_{t}^{y} = \left[ {\begin{array}{*{20}c} {{\text{D}}_{t} } & {{\text{E}}_{t} } \\ \end{array} } \right] \), \( \emptyset_{t} = \left[ {\begin{array}{*{20}c} {\hat{x}_{t}^{T} } & {{\text{u}}_{t}^{T} } \\ \end{array} } \right]^{T} \), \( \varTheta_{t}^{x} = \left[ {\begin{array}{*{20}c} {{\text{B}}_{t} } & {\begin{array}{*{20}c} {{\text{C}}_{t} } & {{\text{K}}_{t} } \\ \end{array} } \\ \end{array} } \right] \), \( \varPsi_{t} = \left[ {\begin{array}{*{20}c} {\hat{x}_{t}^{T} } & {\begin{array}{*{20}c} {{\text{u}}_{t}^{T} } & {e_{t}^{T} } \\ \end{array} } \\ \end{array} } \right]^{T} \). \( \varTheta_{t}^{y} \) can be calculated by using the recursive least squares (RLS) (Turksoy et al. 2013):

$$ \varTheta_{t}^{y} = \varTheta_{t - 1}^{y} + \left( {y_{t} - \varTheta_{t - 1}^{y} \emptyset_{t} } \right)\emptyset^{T}_{t} P_{t} $$
(23)
$$ P_{t} = 1/\beta \left( {P_{t - 1} - \frac{{P_{t - 1} \emptyset_{t} \emptyset^{T}_{t} P_{t - 1} }}{{\beta + \emptyset^{T}_{t} P_{t - 1} \emptyset_{t} }}} \right) $$
(24)

The innovation noise sequence is defined as:

$$ e_{t} = y_{t} - \varTheta_{t}^{y} \emptyset_{t} $$
(25)

Similarly, \( \varTheta_{t}^{x} \) is calculated as follows:

$$ \varTheta_{t}^{x} = \varTheta_{t - 1}^{x} + \left( {\hat{x}_{t + 1} - \varTheta_{t - 1}^{x} \varPsi_{t} } \right)\varPsi^{T}_{t} Q_{t} $$
(26)
$$ Q_{t} = 1/\beta \left( {Q_{t - 1} - \frac{{Q_{t - 1} \varPsi_{t} \varPsi^{T}_{t} Q_{t - 1} }}{{\beta + \varPsi^{T}_{t} Q_{t - 1} \varPsi_{t} }}} \right) $$
(27)

The procedures for subspace identification and performance estimation using the model described above are summarized as follows:

  • Step 1: Calculate model coefficient matrices using the traditional CVA model.

  • Step 2: Calculate the forgetting factor \( \beta_{t} \) as per Eq. 20.

  • Step 3: Compute the updated covariance and cross-covariance matrices \( \Sigma _{a,a\left( t \right)} \), \( \Sigma _{b,b\left( t \right)} \) and \( \Sigma _{b,a\left( t \right)} \) according to Eqs. 1719.

  • Step 4: Update the Hankel matrix \( {\mathcal{H}} \) as per Eq. 8.

  • Step 5: Estimate the state vectors \( \hat{x}_{t + 1} \) and \( \hat{x}_{t} \) as per Eqs. 1314.

  • Step 6: Update the model coefficient matrices via Eqs. 2327.

  • Step 7: Estimate the model outputs \( y_{t} \) according to Eqs. 12.

  • Step 8: Update the forgetting factor \( \beta_{t} \) based on the residual between the estimated outputs and actual measurements.

  • Step 9: Repeat step 1 – step 8 iteratively.

3 Case Study

Rotating machines that operate at high speed and under high pressure are subject to performance degradation and failures. If a fault occurs and the fault evolution is slow, the machine operator may choose to keep the machine running until repair facilities and spare parts are available at the plant. In such a case, the proposed adaptive ACVA model can be used to estimate how the system will behave under faulty operating conditions given future system inputs. In this subsection, the proposed method is applied to an operational industrial centrifugal compressor to predict the performance of the machine during bearing degradation.

The measured time series from compressor A consisted of 368 observations and 13 variables. For this study, all data were captured at a sampling rate of one sample per hour. Table 1 summarizes all measured variables for this compressor. As shown in Fig. 1, the compressor is operated under healthy conditions during the first 320 samples. The readings of the four different bearing-temperature sensors start to rise at around the 321th sampling point; the machine continued to run until the 368th sampling point. At that time, site engineers shut down the compressor for inspection and maintenance. To compare the performance of the developed ACVA approach with that of the traditional CVA model, the first 240 sampling points of the monitored time series were utilized to build an offline CVA model. Then the constructed CVA model was fed with the speed set points used throughout the degradation process to estimate how the system was affected by the fault. On the other hand, the developed adaptive CVA approach was employed to update the constructed model iteratively according to steps 2–8 described in Sect. 2. The predicted outputs obtained from the adaptive CVA model were compared with those obtained from the traditional CVA model to evaluate the performance of the proposed adaptive monitoring method.

Table 1. Measured variables of compressor A
Fig. 1.
figure 1

Trend of four different bearing temperature sensor measurements of compressor A.

In order to determine the optimal number of retained states \( q \), the trained offline CVA model was first utilized to predict system outputs for the data captured during the early stages of degradation. The predicted outputs are compared with the actual measurements and the mean absolute error (MAE) of all output variables are plotted against different values of retained state in Fig. 2. It can be observed from the figure that \( q = 1 \) gives the lowest prediction error; therefore, \( q \) was finally set to 1 to obtain the optimal model that gives the highest predictive accuracy. The value of \( A \) was set to 0.97 according to the empirical parameter-selection procedure proposed by (Choi et al., 2006; Shang et al. 2015). The value of the forgetting factor \( \beta_{t} \) can be updated at each time instance based on the difference between the predicted system outputs and the actual measurements. The value of \( \beta_{t} \) decreases to achieve faster identification with short memory when the residual of the system outputs is larger. When the residual is small, using more information about the past improves the prediction accuracy of the model. The tuning parameter \( \sigma_{1} \) was set to 3.5 in this study, which is the minimum value that can ensure the convergence of the model while maximizing the sensitivity of the model to prediction errors.

Fig. 2.
figure 2

MAE for all output variables for different values of retained state q

Figures 3, 4, 5 and 6 show the forecasted outputs of adaptive CVA and CVA model. The adaptive CVA model can track changes in bearing temperature measurements more accurately than the traditional CVA method. Table 2 summarises the mean absolute percentage error (MAPE) of the developed ACVA model and conventional CVA model. These results imply that the proposed method takes advantage of recursive state-space modelling to reveal the correlation between system input and output signals, thereby increasing the sensitivity of the adaptive CVA to bearing degradation compared to traditional CVA models.

Fig. 3.
figure 3

Radial bearing temperature 1 under faulty operating conditions predicted by adaptive CVA and the CVA model

Fig. 4.
figure 4

Radial bearing temperature 2 under faulty operating conditions predicted by adaptive CVA and the CVA model

Fig. 5.
figure 5

Active thrust bearing temperature under faulty operating conditions predicted by adaptive CVA and the CVA model

Fig. 6.
figure 6

Inactive thrust bearing temperature under faulty operating conditions predicted by adaptive CVA and the CVA model

Table 2. Mean absolute percentage error (100%) for different output variables

4 Conclusion

This paper proposes an adaptive CVA modelling tool to improve the predictive accuracy of traditional CVA methods. A variable forgetting factor was adopted to update the model coefficient matrices and covariance and cross-covariance matrices according to the residuals of the model outputs. The proposed model tracks rapid changes in system outputs due to the use of the adaptive forgetting factor. Condition-monitoring data captured from an operational industrial compressor were used to test the validity of the proposed method. The predicted outputs generated by adaptive CVA highly coincide with the actual measurements. The proposed method takes advantage of recursive state-space modelling to enhance the CVA prognostic performance and increase its sensitivity to bearing deterioration. This method can be used to provide site engineers with more reliable and robust performance estimates of systems operating under varying and abnormal conditions. The information provided by the proposed method can be used to forecast the impact of a fault on the operational process and to develop appropriate production plans and optimal maintenance strategies, thereby making plant operations more safe, productive and profitable.