Keywords

1 Introduction

Batch processes are widely used in modern industry like biochemical, foods and medicines [3]. In Batch process, raw materials are added in batches and the whole process can be divided into a number of stages where different products are made. To ensure the safety of process, monitoring methods gain much attention. Those monitoring methods like principal component analysis (PCA) [6], support vector domain description (SVDD) [2] only utilize process variables to train the model and test abnormity. Besides process variables, we can obtain quality variables at the end of batches. Therefore, partial least square (PLS) [7] gains people’s attention because it considers both process variables and quality variables.

The PLS finds a series of projection directions of both process variables and quality variables. We hope that projected vectors of process variables into projection directions have high correlation with projected vectors obtained from quality variables. Those projection directions can be found through serval times of iterations. After that projected vectors and residual vector can be obtained and both \( T^{2} \) and SPE statistic can be derived. We can compare those statistics with predefined control line to judge the condition of the process. Additionally, kernel partial least square (KPLS) is introduced if we want to analyze non-linear relationship between process variables and quality variables.

In on-line monitoring, only data until current time point are available, which is a kind of missing data problem [5]. In order to deal with missing data, researches are made to estimate statistics when data is incomplete. One handing method is filling those missing data with predefined data, average of training data or current obtained data for instance. Another method makes use of the characteristics of monitoring method. As for PLS, we can estimate projected vectors based on the thought of least square. Then we can calculate \( T^{2} \) and SPE statistics. However, if nonlinear kernel like Gaussian kernel is chosen as kernel in KPLS, we cannot use the thought of least square to estimate projected vectors because the value of kernel cannot be decomposed into sum of kernel of every time slice.

The additive kernel [4, 10] is a special kind of kernel. It considers nonlinear relationships among variables in every time slice and the value of generalized additive kernel can be decomposed into sum of kernel of every time slice. Based on characteristics mentioned above, we can continue to use the thought of least square to estimate projected vectors without filling missing data. In this paper, we will introduce additive kernel into KPLS [9] and name it AKPLS. Then we can also estimate projected vectors without filling those missing data.

2 Kernel Partial Least Square Based Batch Processes Monitoring

Sample data in batch processes are usually expressed as the three-dimensional matrix \( {\underline{\mathbf{X} }}(I \times J \times K) \), where I denotes the number of batches, J denotes the number of variables at each time point, K denotes the number of time slices in a batch. Before using monitoring methods, the data preprocessing for batch processes is needed [8]. In data preprocessing, we unfold the three-dimensional matrix \( {\underline{\mathbf{X} }} \) and get a two-dimensional matrix \( {\mathbf{X}}(I \times JK) \). Figure 1 represents the process of batch-wise unfolding method. Then we centralize and normalize different variables and get process variable matrix \( {\mathbf{X}} \) and quality variable matrix \( {\mathbf{Y}} \).

Fig. 1
figure 1

Batch-wise unfolding process

PLS considers linear relationships between \( {\mathbf{X}} \) and \( {\mathbf{Y}} \), like (1) where \( t_{i} \) denotes projection score, \( {\mathbf{p}}_{i} \) and \( {\mathbf{q}}_{i} \) are loading vectors, \( {\mathbf{E}} \) and \( {\mathbf{F}} \) are residuals.

$$ {\mathbf{X}} = \sum\limits_{i = 1}^{n} {t_{i} } {\mathbf{p}}_{i}^{T} + {\mathbf{E}},{\mathbf{Y}} = \sum\limits_{i = 1}^{n} {t_{i} } {\mathbf{q}}_{i}^{T} + {\mathbf{F}} $$
(1)

For each preprocessed test sample \( {\mathbf{x}} \in R^{m*1} \), we can derive \( n \) projected scores and residual vector. The iteration procedures to derive statistics from complete test sample using PLS are summarized as follow:

  1. 1.

    \( {\mathbf{x}}_{1} = {\mathbf{x}} \), \( i = 1 \).

  2. 2.

    When \( i \le n \), do the following steps 3.

  3. 3.

    \( t_{i} = {\mathbf{x}}_{i}^{T} {\mathbf{w}}_{i} \), \( {\mathbf{x}}_{i + 1} = {\mathbf{x}}_{i} - t_{i} {\mathbf{p}}_{i}^{{}} \), \( i = i + 1 \).

  4. 4.

    Set \( {\mathbf{t}} = \left[ {\begin{array}{*{20}c} {t_{1} } & {t_{2} } & \ldots & {t_{n} } \\ \end{array} } \right]^{T} \) as projected vector, \( {\mathbf{x}}_{n + 1} \) as residual vector.

  5. 5.

    Set \( {\mathbf{t}}^{T} {\varvec{\Lambda}}^{ - 1} {\mathbf{t}} \) as \( T^{2} \) statistic (\( {\varvec{\Lambda}} \) denotes diagonal matrix of the variances of the projected vectors). Set SPE statistic as square of \( {\mathbf{x}}_{n + 1} \).

The control limit of \( T^{2} \) can be estimated by F-distribution and the control limit of SPE can be estimated as chi-squared distribution, like (2), where N denotes the number of training samples, n denotes the number of projection directions, \( \alpha \) denotes the significance level. m and v denote the mean and variance of SPE calculated from training data.

$$ \begin{aligned} & T^{2} \sim \frac{{n(N^{2} - 1)}}{N(N - n)}F_{n,\;N - n,\;\alpha } \\ & SPE \sim g\chi_{h,\alpha }^{2} ,g = \text{v} /\left( {2m} \right),h = 2m^{2} /\text{v} \\ \end{aligned} $$
(2)

In batch process, there exists non-linear relationship among process variables and quality variables. Therefore, we introduce kernel to make it KPLS by replacing \( {\mathbf{X}} \) with \( \Phi ({\mathbf{X}}) \) using (3).

$$ \begin{aligned} &\Phi ({\mathbf{X}}) = \left[ {\begin{array}{*{20}c} {\Phi ({\mathbf{x}}_{1} )} & {\Phi ({\mathbf{x}}_{2} )} & \ldots & {\Phi ({\mathbf{x}}_{N} )} \\ \end{array} } \right]^{T} \\ & Ker({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} ) =\Phi ({\mathbf{x}}_{i} )^{T}\Phi ({\mathbf{x}}_{j} ) \\ \end{aligned} $$
(3)

In batch process on-line monitoring, data after current time point are unavailable. If we choose traditional nonlinear kernel like Gaussian kernel and polynomial kernel, those missing data should be filled before we calculate monitoring statistics. Inappropriate padding may affect the monitoring result. In the next section, another special kernel called additive kernel is introduced to solve this problem.

3 Batch Process Monitoring Based on Additive Kernel Partial Least Square (AKPLS)

In this section, we introduce a kind of kernel called additive kernel (AK) and the training and monitoring procedure of AKPLS.

3.1 Additive Kernel (AK)

Additive Kernel (AK) is a special kind of kernel which is defined as (4), where K denotes K different time slices.

$$ \Phi ({\mathbf{x}}_{i} ) = \left[ {\begin{array}{*{20}c} {\phi ({\mathbf{x}}_{i,1} )^{T} } & {\phi ({\mathbf{x}}_{i,2} )^{T} } & \ldots & {\phi ({\mathbf{x}}_{i,K} )^{T} } \\ \end{array} } \right]^{T} $$
(4)

Data from different time slices are mapped independently in \( \Phi ({\mathbf{x}}_{i} ) \). The value of the whole kernel can be decomposed into the sum of K kernels. We can select different kind of kernels at different time slices to analyze non-linear relationship among variables. If kernel at every time slice is Gaussian kernel, it is called additive Gaussian kernel, like (5).

$$ \begin{aligned} & Ker({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} ) =\Phi ({\mathbf{x}}_{i} )^{T}\Phi ({\mathbf{x}}_{j} ) = \sum\limits_{k = 1}^{K} {\exp \left( { - \left\| {{\mathbf{x}}_{i,k} - {\mathbf{x}}_{j,k} } \right\|^{2} /r} \right)} \\ & \phi ({\mathbf{x}}_{i,k} )^{T} \phi ({\mathbf{x}}_{j,k} ) = \exp \left( { - \left\| {{\mathbf{x}}_{i,k} - {\mathbf{x}}_{j,k} } \right\|^{2} /r} \right) \\ \end{aligned} $$
(5)

3.2 Training Process of AKPLS

Additive Kernel Partial Least Square(AKPLS) is a special case of KPLS where \( \Phi ({\mathbf{x}}_{i} ) \) satisfied with (4) can be chosen as kernel. Then we can derive training process of AKPLS, summarized as Algorithm 1.

Algorithm 1: Training part of AKPLS

Input: \( {\mathbf{X}} \) as process variables matrix, \( {\mathbf{Y}} \) as quality variables matrix, n as iteration times.

  1. 1.

    Set \( \Phi ({\mathbf{X}})_{1} =\Phi ({\mathbf{X}})({\mathbf{I}} - \frac{1}{h}{\mathbf{1}}_{{\mathbf{I}}} ) \), where \( {\mathbf{I}} \) denotes unit matrix, \( {\mathbf{1}}_{{\mathbf{I}}} \) denotes matrix with all elements equals to 1, \( h \) is the dimension of the matrix, the definition of \( \Phi ({\mathbf{X}}) \) is (1.3). Set \( {\mathbf{Y}}_{1} = {\mathbf{Y}} \) and \( i = 1 \).

  2. 2.

    If \( i \le n \) do the following 3, 4, 5, 6, else do 7.

  3. 3.

    Set \( {\mathbf{t}}_{i} \) as the eigenvector of maximum eigenvalue of \( \Phi ({\mathbf{X}})_{i}\Phi ({\mathbf{X}})_{i}^{T} {\mathbf{Y}}_{i} {\mathbf{Y}}_{i}^{T} \).

  4. 4.

    Calculate \( {\mathbf{c}}_{i} \), \( {\mathbf{u}}_{i} \), \( {\mathbf{w}}_{i} \), \( {\mathbf{t}}_{i} \) using \( {\mathbf{c}}_{i} = {\mathbf{Y}}_{i}^{T} {\mathbf{t}}_{i} \), \( {\mathbf{c}}_{i} = {\mathbf{c}}_{i} /\left\| {{\mathbf{c}}_{i} } \right\| \),\( {\mathbf{u}}_{i} = {\mathbf{Y}}_{i} {\mathbf{c}}_{i} \), \( {\mathbf{w}}_{i} =\Phi ({\mathbf{X}})_{i}^{T} {\mathbf{u}}_{i} \), \( {\mathbf{w}}_{i} = {\mathbf{w}}_{i} /\left\| {{\mathbf{w}}_{i} } \right\| \), \( {\mathbf{t}}_{i} =\Phi ({\mathbf{X}})_{i} {\mathbf{w}}_{i} \).

  5. 5.

    Use \( {\mathbf{p}}_{i} = \frac{{\Phi ({\mathbf{X}})_{i}^{T} {\mathbf{t}}_{i} }}{{{\mathbf{t}}_{i}^{T} {\mathbf{t}}_{i} }} \), \( {\mathbf{q}}_{i} = \frac{{{\mathbf{Y}}_{i}^{T} {\mathbf{t}}_{i} }}{{{\mathbf{t}}_{i}^{T} {\mathbf{t}}_{i} }} \), \( \Phi ({\mathbf{X}})_{i + 1} =\Phi ({\mathbf{X}})_{i} - {\mathbf{t}}_{i} {\mathbf{p}}_{i}^{T} \), \( {\mathbf{Y}}_{i + 1} = {\mathbf{Y}}_{i} - {\mathbf{t}}_{i} {\mathbf{q}}_{i}^{T} \) to calculate \( {\mathbf{p}}_{i} \), \( \Phi ({\mathbf{X}})_{i + 1} \), \( {\mathbf{Y}}_{i + 1} \).

  6. 6.

    Set \( i = i + 1 \).

  7. 7.

    Get \( {\mathbf{w}}_{i} ,{\mathbf{p}}_{i}\; (i = 1,2, \ldots n) \) after n iterations.

  8. 8.

    Using \( \Phi ({\mathbf{X}})_{n + 1} \), \( {\mathbf{t}}_{i} \; (i = 1,2, \ldots n) \) to calculate \( SPE_{k} \), \( \overline{SPE}_{k} \), \( T^{2} \) statistics of training data and set control limits of those statistics.

Output: \( {\mathbf{w}}_{i} ,{\mathbf{p}}_{i}\; (i = 1,2, \ldots n) \). Control limit of \( SPE_{k} \), \( \overline{SPE}_{k} \), \( T^{2} \).

In Algorithm 1, statistic \( T^{2} \) is defined as \( {\mathbf{t}}^{T} {\varvec{\Lambda}}^{ - 1} {\mathbf{t}} \) in which \( {\varvec{\Lambda}} \) denotes diagonal matrix of the variances of the scores associated with n projected directions. \( SPE_{k} \) denotes sum of the square of the residual vector corresponding to time point k. \( \overline{SPE}_{k} { = }{{\sum\nolimits_{i = 1}^{k} {SPE_{i} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{k} {SPE_{i} } } k}} \right. \kern-0pt} k} \). The control limits of those statistics are shown in (2).

3.3 On-line Monitoring of AKPLS

In on-line monitoring, variables after current time slice are unknown. In this part, we will mainly discuss how to estimate monitoring statistics without filling in unknown data in AKPLS.

In AKPLS, \( \Phi ({\mathbf{x}}_{j} ) \) can be decomposed as (6). At time point k, \( \Phi ^{(1,k)} ({\mathbf{x}}_{j} ) \) is known while \( \Phi ^{(k + 1,K)} ({\mathbf{x}}_{j} ) \) is unknown.

$$ \begin{aligned} &\Phi ({\mathbf{x}}_{j} ) = \left[ {\begin{array}{*{20}c} {\Phi ^{(1,k)} ({\mathbf{x}}_{j} )^{T} } & {\Phi ^{(k + 1,K)} ({\mathbf{x}}_{j} )^{T} } \\ \end{array} } \right]^{T} \\ &\Phi ^{(1,k)} ({\mathbf{x}}_{j} ) = \left[ {\begin{array}{*{20}c} {\phi ({\mathbf{x}}_{j,1} )^{T} } & {\phi ({\mathbf{x}}_{j,2} )^{T} } & \ldots & {\phi ({\mathbf{x}}_{j,k} )^{T} } \\ \end{array} } \right]^{T} \\ &\Phi ^{(k + 1,K)} ({\mathbf{x}}_{j} ) = \left[ {\begin{array}{*{20}c} {\phi ({\mathbf{x}}_{j,k + 1} )^{T} } & {\phi ({\mathbf{x}}_{j,2} )^{T} } & \ldots & {\phi ({\mathbf{x}}_{j,K} )^{T} } \\ \end{array} } \right]^{T} \\ \end{aligned} $$
(6)

In on-line monitoring, we can derive optimization problem (7) to estimate projection vectors of test data \( {\mathbf{x}}_{j} \), with corresponding results shown in (8). \( \hat{t}_{i} \) shows the estimated projected score. \( {\mathbf{w}}_{i}^{(1,k)} \) and \( {\mathbf{p}}_{i}^{(1,k)} \) show parts of \( {\mathbf{w}}_{i} \) and \( {\mathbf{p}}_{i} \) corresponding to times slices from 1 to k. Then we can derive \( \Phi _{i + 1}^{(1,k)} ({\mathbf{x}}_{j} ) \) with (9). On-line monitoring of AKPLS is shown in Algorithm 2.

$$ { \hbox{min} }\left\| {\Phi _{i}^{(1,k)} ({\mathbf{x}}_{j} )^{T} - {\hat{t}}_{i}\;{\mathbf{w}}_{i}^{(1,k)T} } \right\|_{2} $$
(7)
$$ {\hat{t}_{i} } =\Phi _{i}^{(1,k)} ({\mathbf{x}}_{j} )^{T} {\mathbf{w}}_{i}^{(1,k)} /({\mathbf{w}}_{i}^{(1,k)T} {\mathbf{w}}_{i}^{(1,k)} ) $$
(8)
$$ \Phi _{i + 1}^{(1,k)} ({\mathbf{x}}_{j} ) =\Phi _{i}^{(1,k)} ({\mathbf{x}}_{j} ) - {\hat{t}_{i} } {\mathbf{p}}_{i}^{(1,k)} $$
(9)

Algorithm 2: On-line monitoring of AKPLS

Input: \( {\mathbf{x}}^{(1,k)} \) denotes variables of \( {\mathbf{x}} \) from time slice 1 to k. \( {\mathbf{w}}_{i} ,{\mathbf{p}}_{i}\;(i = 1,2, \ldots n) \), control limit of \( SPE_{k} \), \( \overline{SPE}_{k} \), \( T^{2} \).

  1. 1.

    Set \( i = 1 \) and centralize \( \Phi ^{(1,k)} ({\mathbf{x}}) \) to get \( \Phi _{1}^{(1,k)} ({\mathbf{x}}) \).

  2. 2.

    If \( i \le n \) do the following steps 3, else do 4.

  3. 3.

    Estimate \( {\hat{t}_{i} } \) with (8), estimate \( \Phi _{i + 1}^{(1,k)} ({\mathbf{x}}) \) with (9), \( i = i + 1 \).

  4. 4.

    Set \( {\mathbf{t}} = [\begin{array}{*{20}c} {{\hat{t}_{1} }} & {{\hat{t}_{2} }} & \ldots & {{\hat{t}_{n} }} \\ \end{array} ]^{T} \).

  5. 5.

    Calculate monitoring statistics using \( \Phi _{n + 1}^{(1,k)} ({\mathbf{x}}) \) and \( {\mathbf{t}} \).

  6. 6.

    Compare statistics with control limit and judge whether \( {\mathbf{x}} \) is normal.

Output: whether the sample is normal or not.

Additive kernel has characteristics that variables in different time slices are independent when mapped into feature space. \( {\mathbf{w}}_{i} \) and \( {\mathbf{p}}_{i} \)(\( i = 1,2, \ldots n \)) are linear combinations of projections in feature space of training data. Therefore, \( {\mathbf{w}}_{i} \) and \( {\mathbf{p}}_{i} \) can also be decomposed into \( {\mathbf{w}}_{i}^{(1,k)} \), \( {\mathbf{w}}_{i}^{(k + 1,K)} \) and \( {\mathbf{p}}_{i}^{(1,k)} \), \( {\mathbf{p}}_{i}^{(k + 1,K)} \) respectively, similar to (6). However, some nonlinear kernels like Gaussian kernel cannot be decomposed into two parts like (6), so we cannot use methods discussed above to estimate statistics without filling unknown data.

More experiments and analysis about score estimation using the thought of least square are shown in [5], where estimation methods in PLS are discussed.

4 Case Study

A fed-batch penicillin fermentation dataset is used to evaluate the performance of methods. The dataset is simulated by a standard simulator Pensim V.2.0 [1].

In my experiments, 110 normal batches and 4 abnormal batches are generated from the simulator. Every batch lasts 400 h and the sampling rate is 30 min. The start time point of faults of 4 abnormal batches are 60, 55, 40, 30 h respectively. Those 4 faults last until the end of batch. Training dataset includes 100 normal batches. Test dataset includes 10 normal and 4 abnormal batches. Eight variables including Aeration rate, Agitator power, Substrate feed temperature, Culture volume, Carbon dioxide concentration, pH, Bioreactor Temperature, Generated heat are selected as process variables while 3 variables including Substrate concentration, Biomass concentration, Penicillin concentration are quality variables. Every batch includes 6400 process variables and 3 quality variables because we can get quality variables only at the end of every batch.

On-line monitoring starts from the 21st time point until the end of the batch. We choose 21 because a small portion of data is necessary when we estimate test statistics. Faced with missing data problem, PLS and AKPLS uses method discussed in Algorithm 2 to estimate statistics while KPLS fills data after current time with current data before calculating monitoring statistics. \( \overline{SPE}_{k} \) is selected to judge the test sample because it works better than other statistics.

From Table 1 we can see that \( \overline{SPE}_{k} \) together with KPLS or AKPLS can discriminate normal batches from abnormal batches. Results in Table 2 show that AKPLS with Gaussian kernel is better than other methods because it alarms earlier than other methods in most cases after faults occur. The control charts of three methods are presented in Fig. 2, where time units are 30 minutes.

Table 1 On-line monitoring results in penicillin dataset \( (\overline{SPE}_{k} ) \)
Table 2 Alarm time (0.5 h) of 4 abnormal batches \( (\overline{SPE}_{k} ) \)
Fig. 2
figure 2

On-line monitoring results with \( \overline{SPE}_{k} \)

5 Conclusion

In this paper, we propose a special case of kernel partial least square called additive kernel partial least square(AKPLS) and show its application in batch process on-line monitoring. AKPLS uses a special kind of kernel called additive kernel where variables in different time points are mapped independently into feature space. Using ASPLS, we can estimate statistics without filling missing data after current time point in on-line monitoring. Experiments on penicillin dataset show effectiveness of AKPLS.