Keywords

1 Introduction

Bioprocesses are described as biological systems that are non-linear, complex and unsteady; thus development of precise control systems in order to achieve robust product quality and productivity can be challenging. The control of these processes can be significantly improved by online process monitoring followed by corrective actions. In this context, bioprocess digital twins are helpful tools.

Digital twins are virtual representations of the production process which enable pre-emptive process control by using online data to predict the process outcome in advance. They convert the physical process to a smart process and thus achieve the ultimate goal of the digital transformation. This enables unprecedented possibilities for timely and automated intervention to provide critical decision support during process development [1].

Digital twins mainly consist of a mathematical model which describes the dynamic behaviour observed in a biochemical reactor and a prediction or self-learning algorithm which estimates the cellular component concentrations and the process parameters that cannot be described mechanistically [2, 3].

Bioprocess mathematical models may generally be categorized into algebraic equations and dynamic models. Algebraic equations are developed from mass and component balances, from mass or heat transfer laws or even from elemental balances. Dynamic models usually consist of dynamic balances of conserved quantities in combination with kinetics to describe rate expressions as functions of the state variables. Detailed description of mathematical modelling of bioprocesses is covered by previous authors in greater details than space allows here [4,5,6,7]. The goal of this chapter is to highlight state estimation methods with a specific focus on the Kalman filter and its non-linear extensions.

For linear systems, the Luenberger observer and the Kalman filter, whose 60th anniversary occurred in 2020 [8], are the most applied methods for estimating parameters and process variables that cannot be measured directly. In the area of non-linear systems, particle filtering (PF), high gain observers, non-linear extensions of the Kalman filter such as the extended Kalman filter (EKF) and the unscented Kalman filter (UKF) and many others have been proposed. However, due to the simple structure and low computational effort of non-linear extensions of the Kalman filter, these methods have gained more interest, and many research studies have been dedicated to the implementation of such filters for state and parameter estimation in bioprocess technologies. The main objective of this chapter is to discuss the applications of different Kalman filter algorithms in bioprocess technologies. Therefore, this chapter is organized as follows: in the next section, a brief overview of the Kalman filtering theory and its non-linear extensions will be discussed. Applications of the Kalman filter for the supervision of cultivation processes will be given in the third section, followed by a case study evaluating the implementation of an extended Kalman filter for developing a digital twin of the backer’s yeast batch cultivation process. In the last section, a conclusion is presented.

2 Kalman Filtering Theory and Its Non-linear Extensions

The Kalman filter is a set of mathematical equations that provides an efficient computational solution of the least-squares method when the considered system is linear and the uncertainties are modelled by Gaussian random variables. When the system state dynamics is non-linear, then certain linearization methods are applied. The most prominent of these algorithms are the extended Kalman filter (EKF) and the unscented Kalman filter (UKF), invented independently by several research groups. Different extensions of the Kalman filters differ in the way the estimation error is calculated. A brief overview of these methods are as follows.

2.1 The Kalman Filter

The Kalman filter is used to provide optimal estimates of unmeasured states for time varying linear systems in the presence of noise by combining information from a process mathematical model with online process measurements. The process model defines the evaluation of the state from time k−1 to time k as:

$$ {\boldsymbol{x}}_{\left[k\right]}={A\boldsymbol{x}}_{\left[k-1\right]}+{B\boldsymbol{u}}_{\left[k-1\right]}+{\boldsymbol{w}}_{\left[k-1\right]} $$
(1)

where x is the state vector, u is the process input and w is the Gaussian process noise vector that is assumed to be zero-mean with the covariance Q. Matrix A relates the state at the previous time step k−1 to the state at the current step k, matrix B relates the control input to the state variables x.

The process model is paired with the measurement model that describes the relationship between the state and the measurement at the current time step k as:

$$ {\boldsymbol{z}}_{\left[k\right]}={C\boldsymbol{x}}_{\left[k\right]}+{\boldsymbol{v}}_{\left[k\right]} $$
(2)

where z is the measurement vector and v is the Gaussian measurement noise vector which is assumed to be zero-mean with the covariance R. Matrix C relates the state to the measurement z[k]. Since the measurements does not exhaustively inform on the current situation of the process, the KF aims to provide an estimate of the process state at time k, given the initial state of x0, the measurements and the information of the system.

The Kalman filter algorithm consists of two steps which are summarized as follows:

  • Prediction step (time update): Using the initial condition, the process model is used to predict the state variables and the estimation error covariance’s until the first measurement is available.

$$ {\boldsymbol{x}}_{\left[k\right]}={A\boldsymbol{x}}_{\left[k-1\right]}+{B\boldsymbol{u}}_{\left[k-1\right]} $$
(3)
$$ {P}_{\left[k\right]}=A{P}_{\left[k-1\right]}{A}^T+Q $$
(4)

In the above equations, x[k] is the state variables estimate at time k which is deduced from a previous estimation of the state x[k − 1] at time k−1. The new term P is called the state error covariance matrix which encrypts the error covariance of the predicted state values. P[k] is the new prediction error covariance matrix at time k and P[k − 1] is the previous estimated error covariance matrix at time k−1. Whenever a measurement is available, a correction step is performed:

  • Correction step (measurement update): In this step the predicted model estimates are combined with the measured values to provide corrected estimates.

$$ {\boldsymbol{x}}_{f,\left[k\right]}={\boldsymbol{x}}_{\left[k\right]}+{K}_{\left[k\right]}\left({\boldsymbol{z}}_{\left[k\right]}-C{\boldsymbol{x}}_{\left[k\right]}\right) $$
(5)
$$ {P}_{f,\left[k\right]}={P}_{\left[k\right]}{\left(1-{K}_{\left[k\right]}C\right)}^2+{K}^2R $$
(6)
$$ {K}_{\left[k\right]}={P}_{\left[k\right]}{C}^T{\left(R+{CP}_{\left[k\right]}{C}^T\right)}^{-1} $$
(7)

The measurement prediction error, reflects the discrepancy between the true measurements z[k] and the predicted measurements Cx[k]. The difference of both is multiplied by the so called Kalman gain and used to update the estimated state variables. Therefore the filtered state variables xf, [k] are obtained. In the similar manner, the filtered estimation error covariance Pf, [k] is obtained. K[k] is chosen to minimizes the estimated error covariance

$$ \frac{d{P}_f}{dK}=0 $$
(8)

The measurement error variance must be compared with the estimation error variance to see how the filter is acting. For this purpose, a very rough treatment is necessary:

If R ≪ CP[k]CT then K ≈ C−1 and xf, [k] ≈ C−1z[k]; so the filtered is almost determined by the measured.

If R ≫ CP[k]CT then xf, [k] ≈ x[k]; the filtered value is almost the estimated one and no influence of the measurement will be obtained.

With the filtered values as initial condition the simulation of the process as well as the estimation error covariance’s can be carried out until the next measurement is obtained and everything repeats again. The flow chart of the Kalman filter algorithm is presented in Fig. 1.

Fig. 1
figure 1

The flow chart of the Kalman filter algorithm

2.2 Continuous-Discrete Extended Kalman Filter

As described in the previous section, the Kalman filter addresses the general problem of trying to estimate the state of a process that is governed by a linear differential equation system. In non-linear dynamic systems, the process model or the measurement model cannot be determined with multiplication of vectors and matrices. For such systems, a linearization should be performed. The linearization can be performed by different methods. The essential difference among different versions of the Kalman filters (extended Kalman filter, unscented Kalman filter and ensemble Kalman filter) consists in how they calculate the estimation error. A Kalman filter that linearizes about the current mean and covariance is referred to as an extended Kalman filter (EKF). A non-linear dynamic system can be described by the following differential equation:

$$ \frac{dx(t)}{dt}=f\left(x(t),u(t)\right)+w(t) $$
(9)

With discrete measurements that are:

$$ {\boldsymbol{z}}_{\left[k\right]}=h\left[\boldsymbol{x}\left({\boldsymbol{t}}_{\left[k\right]}\right)\right]+{\boldsymbol{v}}_{\left[k\right]} $$
(10)

The differential equation provide the continuous part, the measurements are the discrete part, where f is a non-linear function of the state variables x and the control input u. The non-linear function h in the measurement equation relates the current state to the measurement z[k]. w and v are, respectively, the process noise vector and the measurement noise vector. These noises are assumed to be zero mean, white, and independent of each other, with respective covariance matrices Q and R.

To calculate the estimation error covariance matrix, the following differential equations have to be solved in parallel to the state differential equation.

$$ \frac{d\boldsymbol{P}\left(\boldsymbol{t}\right)}{dt}=F(t)P(t)+P(t){F}^T(t)+Q $$
(11)

Here the Jacobian matrix is used, which is given by the following equation:

$$ F={\left.\frac{\partial f}{\partial x}\ \right|}_{x(t),\kern0.5em u(t)} $$
(12)

The filtering is performed as follows:

$$ {K}_{\left[k\right]}=P\left({t}_k\right){H}^T\left({t}_k\right){\left[H\left({t}_k\right)P\left({t}_k\right){H}^T\left({t}_k\right)+R\right]}^{-1} $$
(13)
$$ {\boldsymbol{x}}_f\left({t}_{\left[k\right]}\right)=\boldsymbol{x}\left({t}_{\left[k\right]}\right)+{K}_{\left[k\right]}\left[{\boldsymbol{z}}_{\left[k\right]}-h\left[\boldsymbol{x}\left({\boldsymbol{t}}_{\left[k\right]}\right)\right]\right] $$
(14)
$$ {P}_f\left({t}_{\left[k\right]}\right)=\left[I-{K}_{\left[k\right]}{H}_{\left[k\right]}\right]P\left({t}_{\left[k\right]}\right){\left[I-{K}_{\left[k\right]}{H}_{\left[k\right]}\right]}^T+{K}_{\left[k\right]}R{K}_{\left[k\right]}^T $$
(15)

where H[k] is the Jacoby matrix of h[]:

$$ {H}_{\left[k\right]}={\left.\frac{\partial h}{\partial x}\ \right|}_{x_{\left[k\right]}} $$
(16)

Correspondingly to the KF algorithm, the EKF algorithm consists of two main parts including prediction step and the correction step.

As mentioned above, the basic framework for the EKF involves state estimation of a non-linear dynamic system. However, in some cases, prediction of xk requires coupling both state estimation and parameter estimation [9]. Here a process model parameter p(t) is considered to be time dependent and can be estimated by adding the parameter as an additional state variable whose differential equation is then given as

$$ \frac{d\boldsymbol{p}(t)}{dt}=0 $$
(17)

At every time step, the current estimate of the parameter p(t) is used in the measurement filter. In the joint estimation method, model state variables and model parameters are included in a single joint state vector. Parameter estimation evolves in time along with state estimation, as observations are assimilated [10].

Other alternatives for parameter estimation with the KF include calibrating parameters outside the KF calculation with an outer optimisation routine [11,12,13], and parameter estimation in steady-state KF calculations where observations are climatological averages over the entire time period of interest [14], but in both of these two approaches the parameter estimation part of the calculation considers all observations at once rather than sequentially.

2.3 Other Non-linear Extensions of the Kalman Filter

As mentioned previously, when the system is non-linear and can be well approximated by linearization, then the EKF is a good option for state estimation; however EKF is not optimal if the system is highly non-linear, this is because only the mean is propagated through the non-linearity [15]. The unscented Kalman filter (UKF) is another non-linear extension of the Kalman filter which is a discrete time filtering algorithm. The UKF utilizes the unscented transformation for computing approximate solutions to the filtering problems.

A general framework for state estimation based on the UKF for this state space model is presented as follows:

In the first step, the initial values for the state and covariance estimation have to be set. Following this, the recursive estimation is performed by the prediction and correction steps. Within the prediction step, a priori state and covariance estimation utilizing the process model is performed. Using the unscented transformation, a set of sigma points are chosen. These sigma points characterize the current probability density function. Each point from the sigma matrix is propagated through the process model to calculate the estimations of state variables and the error covariance. Following this, a correction step is preformed when a measurement is received. This leads to the estimations of the filtered state variables and the filtered error covariance by calculating the Kalman gain.

The UKF has been used in various fields for non-linear sate estimations. However a couple of alternative approaches have emerged over the last few years, namely, the ensemble Kalman filter (EnKF) and the cubature Kalman filter (CKF) which are widely used when the process model is of extremely high order and non-linear, the initial states are highly uncertain and a large number of measurements are available [16, 17].

Similar to the UKF, the EnKF and CKF select a set of sample points (sigma points) in order to deal with the non-linearity of the system. In high-dimension systems, the weights of the sigma points in the UKF are prone to be negative, leading to low estimation accuracy.

In EnKF the error covariances are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker–Plank equation, can be estimated using ensemble integrations, and the error covariance matrices can be calculated by integrating the ensemble of model states [16].

The cubature Kalman filter uses the spherical–radial cubature rule to generate some weighted sampling points to approximate integral in Bayesian estimation. A brief overview of the unscented Kalman filtering and sigma point filtering in general are given by van der Merwe [18].

3 Application of Kalman Filters in Bioprocess Monitoring

Here 41 recent published articles [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60] in the period of 1991–2020 on application of the Kalman filter and its extensions for state and parameter estimation in bioprocesses are discussed. Due to space limitation, only some of the reported articles are presented in Table 1. The table is organized by classifying the articles into different categories, which include the type of the Kalman filter and the applied process model, the type of microorganism and the cultivation process mode, the measured process variable(s) and the objective of the filtering algorithm. This table would help understanding how the Kalman filter was explored chronologically to date. It should be mentioned that in some works more than one Kalman filter algorithm are examined. More detailed description of each category for all publications is presented in the following part of this section.

Table 1 Extended Kalman filter application for cultivation processes

3.1 Type of Kalman Filter

According to the type of Kalman filter algorithm, the literature presented indicates there exist a considerable number of articles on implementation of EKF for state and parameter estimation. More than 60% of the applications (28 articles) have implemented EKF algorithms for their process. This is due to the fact that the cultivation process of microorganisms is a complex non-linear biochemical process and the EKF is a well-known state estimation method for non-linear systems. The linear Kalman filter which is almost exclusively used for state estimation in linear systems have also been used by some authors (3 articles). Although the EKF shows good prediction results and is widely used in literature, it presents some disadvantages. It is reliable for systems which are almost linear on the time scale of the update intervals; it requires the calculation of Jacobians at each time step, which may be difficult to obtain for higher order systems; it does linear approximations of the system at a given time instant, which may introduce errors in the estimation, leading then the state to diverge over time [9, 15]. For instance, in continuous or fed-batch cultivations, despite continuous supply by a feed, the substrate concentration can drop to zero as the cell takes it up very fast. In such cultivations, linearization in the time and measurement update can lead to significant inaccuracies in the process, while the EKF assumes a certain probability for substrate concentrations below zero, even though this is physically impossible [54]. Therefore in recent years, application of other non-linear extensions of the Kalman filter is used. For example, Fernandes et al. [54] have implemented an UKF algorithm in order to estimate glucose and glutamine from biomass, lactate and ammonia measurement during fed-batch cultivation of hybridoma cells. The predictions were compared to the ones obtained with an EKF; they have reported the UKF achieves better level of accuracy. Krämer and King [57] have implemented a UKF in fed-batch cultivation of S. cerevisiae for noise filtering from predicted biomass values with NIR spectrometer. In another study, the same authors [54] have implemented an EKF for the same process. The authors have reported accurate predicted values in both studies; however there is no comparison between the two methods. Other types of the non-linear Kalman filtering method have also been reported in literature. Zhao et al. [53] have implemented a CKF for incorporating delayed measurements of biomass, substrate, and product concentration in fed-batch cultivation for penicillin production. Bavdekar et al. [47] have implemented an EnKF for overcoming delayed measurements of biomass, substrate and ethanol concentration in fed-batch cultivation of S. cerevisiae. Addressing the same delay problem Klockow et al. [43] complemented a ring buffer by an EKF and got satisfied results.

In order to indicate which Kalman filter extension describes the process better, numerical simulation runs are required. According to this perspective, a closer look to the presented articles indicates that most studies (31 articles) had relied on practical applications and simulation studies have been reported only 12 times.

3.2 Microorganism

Regarding the type of microorganism, the articles show that the majority of the research has focused on applying the Kalman filter or its extensions for state or parameter estimation during the cultivation of S. cerevisiae (19 articles) and E. coli (7 articles). The importance of these microorganisms for the biopharmaceutical industry is widely recognized, as E. coli and S. cerevisiae are the most important host microorganism used to produce recombinant proteins [58]. In addition, S. cerevisiae is also widely used for the production of the backers yeast as well as wine and beer. Only a few articles demonstrate state estimation in the cultivation process of other microorganisms. For instance, some authors have implemented state estimation methods for prediction of substrate and product concentration during cultivations of Candida utilis [30], Penicillium chrysogenum [46, 53] and Kluyveromyces marxianus [34].

3.3 Cultivation Mode

From an operational point of view, cultivation of microorganisms can be performed in batch, fed-batch and continuous modes. In fed-batch cultivation modes, set point control of the substrate concentration by manipulating the input flow rate is a matter of particular economic and scientific interest. In order to have an efficient control system, sufficient knowledge about the process state variables is required, which can be achieved by the state estimation methods such as the Kalman filter or its extensions. Therefore, previous studies have almost exclusively focused on the application of state estimation methods for fed-batch cultivations (34 publications). However, online monitoring and estimation of state variables in batch cultivations is also crucial in order to monitor the state and if necessary may improve it to achieve high productivity over the process. For instance, controlling the level of dissolved oxygen (DO) in the fermentation broth, effects the rate of microbial metabolism. Accordingly, Lee et al. [19] have implemented an EKF for noise filtering of dissolved oxygen measurements which were used for controlling the DO levels in batch cultivation of E. coli. This approach and, more generally, online monitoring and state estimation of variables in batch cultivations remain briefly addressed in the literature.

3.4 Bioprocess Phase

Mixing of medium and pre-cultures are performed during upstream processing phase and separation and purification of the product from biomass is performed during the downstream processing phase. In order to optimize cell growth and maximize the product yield, online monitoring and a tight control is required during both phases. The presented articles show there have been numerous studies to investigate the application of state estimation methods during the cultivation phase (39 papers). However, the articles indicate that only two authors had examined the application of Kalman filtering methods for state and variable estimation in downstream processing. For efficient and robust process development in the downstream processing phase, knowledge of the location and concentration of the product and key contaminants is also crucial. Holwill et al. [28] have used a low technology detection system involving the measurement of rate of change of absorbance at a single wavelength after addition of reagent to a representative sample stream. This provided online data detailing the performance of a continuous precipitation process. This information as well as a mathematical model which describes the fractional protein perception were fed into a control algorithm which was programmed to maintain predefined set points by feedback control through adjustments to the overall feed saturation. The Kalman filter was used for estimating the parameters of the model. Feidl et al. [59] developed a state estimation procedure for estimation of antibody concentration by combining information coming from kinetic model and a Raman analyser, in the frame of an extended Kalman filter approach (EKF).

3.5 Measurement Device

An overview of measurement devices that are appropriate for the operation of bioprocesses is presented by Sonnleitner [61]. More specific details of different types of sensors and their measurement principles can be found in literature [62, 63]. The literature presented indicate that in E. coli cultivation, most authors have employed DO and CO2 measurements from the exit gas or glucose measurements using flow injection analysis as the measurement in the Kalman filter algorithm. On the other hand, in S. cerevisiae cultivations, besides DO, CO2 and glucose measurements, biomass measurements have also been widely applied. For example, Dewasme et al. [48] applied biomass measurements for their KF during an E. coli cultivation.

3.6 Process Model

According to the articles presented, the general mass balance equations are the most common mathematical approach used for describing the process in state observing algorithms. An overview of typical models applied to bioprocesses is presented by Chhatre [64]. A wide variety of growth kinetics are developed for modelling of particular bioprocesses. The Monod growth model [65] is the most applied method for calculating the growth kinetics of microorganisms; it corresponds to a rational function in which the specific growth rate μ is only a function of a single limiting substrate concentration and is subjected to substrate saturation when S ≫ Ks.

$$ \mu ={\mu}_{\mathrm{max}}\ \frac{S}{\ {K}_s+S} $$
(18)

where μmax is the maximum specific growth rate, Ks is the Monod half-saturation constant, and S is the concentration of the limiting substrate. In the mentioned articles, all of the authors, which were growing S. cerevisiae and E. coli, have implemented the Monod growth kinetics. A modified Monod model was applied by Patnaik [35, 38] which is described in detail by Henson and Seborg [66] or Jones and Kompala [67]. Application of other methods for calculating the growth kinetics such as the Contois growth model [68] has also been reported. A feature of the Contois growth model is that growth rate depends upon the concentrations of both substrate and cell mass with the consequence that an inhibition is present at high cell concentrations. This growth kinetic has been implemented in a process model describing the growth behaviour of Penicillium chrysogenum in fed-batch cultivations. A modified Contois model was applied by Jianlin et al. [48] and Zhao et al. [53] in an UKF and CKF algorithm for biomass and substrate prediction, respectively. The growth rate can also be represented by artificial neural networks. However this kind of models is not applied often in combination with a KF. Zorzetto and Wilson [27] have applied a hybrid model in an EKF algorithm which is based on the theory of limited respiratory with using artificial neural network for predicting the growth rates during fed-batch cultivation of S. cerevisiae.

Most of the process models which are reported in literature and are used in the Kalman filter algorithms are considered to be ideal stirred tank reactors, whereas production-scale operations are corrupted by noise. This problem is more sever in large-scale operations than in laboratory-scale fermentations [35]. This can describe why all applications of state estimation methods presented in Table 1 are performed in laboratory-scale bioreactors (most cultivations are performed in a 2–5 L bioreactor and one cultivation [57] have been performed in a 22 L bioreactor).

4 An Extended Kalman Filter for the Monitoring of a Yeast Cultivation

The integration of gas sensor array data in a non-linear state estimator has not been discussed previously in the literature. Yousefi-Darani et al. [69] have designed and implemented a model-based calibrated gas sensor array for online measurement of ethanol concentration in batch cultivation with the yeast S. cerevisiae. However the predicted values are only available every 5 min. Therefore in this work, in order to have continues values of ethanol concentration as well as the values of biomass, glucose and the maximal growth rates, we have implemented an EKF. In addition, the whole estimation producer could be considered as a digital twin of the baker’s yeast batch cultivation process, which could be used for process optimization and control.

4.1 The Cultivation Process

The cultivation of Saccharomyces cerevisiae (fresh baker’s yeast, Oma’s Ur-Hefe) was carried out in a 2.5 L bioreactor (Minifors, Infors HT, Bottmingen, Switzerland) with a vessel of stainless steel working volume of 1.35 L equipped with a temperature (set point of 30°C) and pH (set point pH = 5) control unit. The aeration and agitation rates were kept constant at 3.5 L min−1 and 500 rpm, respectively. For the pre-culture, 5 g of the baker’s yeast was suspended into 100 mL medium containing 0.34 g L−1 MgSO4·7H2O, 0.42 g L−1 CaCl2·2H2O, 4.5 g L−1 (NH4)2SO4, 1.9 g L−1 (NH4)2HPO4, 0.9 g L−1 KCl. The inoculation was performed after 10 min of shaking. The same medium supplemented with glucose to a final concentration of 5 g L−1 as well as 1 mL L−1 trace elements solution (0.015 g L−1 FeCl3·6H2O, 9 mg L−1 ZnSO4·7H2O, 10.5 mg L−1 MnSO4·2H2O, and 2.4 mg L−1 CuSO4 5H2O) and 1 mL L−1 vitamin solution (0.06 g L−1 myoinositol, 0.03 g L−1 Ca-pantothenate, 6 mg L−1 thiamine HCl, 1.5 mg L−1 pyridoxine HCl, and 0.03 mg L−1 biotin) was used for the cultivation. The experimental setup is presented in Fig. 2.

Fig. 2
figure 2

Overview of the experimental setup

4.2 EKF Algorithm

The EKF uses discrete measurements of ethanol from the gas sensor array and estimates continuous online values of ethanol, biomass and glucose concentrations as well as the maximal growth rates in S. cerevisiae batch cultivation. A detailed description of the working principle of the EKF is presented in Sect. 2.2.

The EKF was implemented using the software Matlab® 2019a (version 9.6.0); the “Symbolic Math” toolbox (version 8.3) was used to calculate the estimation error covariance differential equation matrix (25 equations). For all calculations, a normal office PC (Intel Core® i5 8,500 with 8 GiB of RAM) with Window 10 was used. For the simulation, the system of in total 30 (5 + 25) differential equations was solved numerically using the explicit, Runge–Kutta-based ode45 method from Matlab. The Matlab code can be found in the appendix.

4.3 Online Ethanol Measurements

The online ethanol measurements were performed in a self-developed system equipped with commercially available metal oxide semiconductor (MOS) gas sensors (TGS 822, TGS 813 and MQ3). The sensors were located in a measuring chamber with a volume of 250 mL and operated in two cycles: a measurement cycle and a washing cycle. During the measurement cycle, the headspace gas was pumped into the measurement chamber for 10 s at a flow rate of 400 mL min−1 with a diaphragm pump (Schwarzer Precision, Essen, Germany). Then the chamber was flushed by pure oxygen for regeneration. A peak-shaped measurement signal is obtained, which was evaluated by using a chemometric model, which is described in detail in the literature [69]. Therefore, every 5 min a new ethanol measurement value is used by the Kalman filter. Figure 3 presents a schematic diagram of the online ethanol measurement system and the EKF for continuous state variables and parameter estimation.

Fig. 3
figure 3

Schematic diagram of the online ethanol measurement system and the EKF for continuous state variables and parameter estimation

Note that the EKF was carried out after the experiments were performed. The results, however, carry over to a true online application where the data is not analysed or modified in retrospect.

4.4 Offline Measurements

For offline analysis, samples were regularly taken from the bioreactor and placed in pre-weighed and pre-dried micro centrifuge tubes. For biomass determination, the sample without supernatant were dried for 24 h at 103°C and after cooling for 30 min weighed. Using the filtrated supernatant (pore size filter, 0.45 μm, polypropylene membrane, VWR, Darmstadt, Germany), glucose and ethanol were determined by HPLC (ProStar, Variant, Walnut Creek, CA, USA); injection of 20 μL into a Rezex ROA-organic acid H+ (8%) column (Phenomenex, Aschaffenburg, Germany) and operated at 70°C with 5 mM H2SO4 as an eluent at 0.6 mL min−1 flow rate; software GalaxieTM Chromatography (Varian, Walnut Creek, CA, USA). The offline values were not used during the estimation of the state variables and are only taken to show that the estimates are accurate.

4.5 State Equations of the Cultivation Process

As bioreactor an ideal stirred tank reactor was assumed. As state variables, the biomass, glucose and ethanol concentrations as well as the maximal specific growth rate on glucose and ethanol were applied. Therefore, the following state equations are obtained:

$$ \frac{d}{dt}\left[\begin{array}{c}X\\ {}G\\ {}E\\ {}{\mu}_{\mathit{\max},G}\\ {}{\mu}_{\mathit{\max},E}\end{array}\right]=\left[\begin{array}{c}\left({\mu}_G+{\mu}_E\right)X\\ {}-\frac{\mu_G}{Y_{GX}}X\\ {}\left(\frac{\mu_G}{Y_{GE}}-\frac{\mu_E}{Y_{EX}}\right)X\\ {}0\\ {}0\end{array}\right] $$
(19)

were μG and μE are given as

$$ {\mu}_G=\frac{\mu_{\mathit{\max},G}\bullet G}{K_G+G} $$
(20)
$$ {\mu}_E=\frac{\mu_{\mathit{\max},E}\bullet E}{K_E+E}\bullet {\left(1-\frac{\mu_G}{\mu_{\mathit{\max},G}}\right)}^2 $$
(21)

As one can see from the state equation, the Kalman filter is used to estimate the maximum specific growth rate on glucose μmax, G and on ethanol μmax, E. The importance of the specific growth rate for the assessment of a cultivation is discussed by Galvanauskas et al. [70].

The extension to the ordinary Monod model for μE is applied, so that the transformation from glucose consumption to ethanol consumption is modelled. In Tables 2, 3, and 4 the parameters of the model as well as the initial values for the state equations and the initial values of the estimation error covariance are presented.

Table 2 Parameter values used for the simulation model
Table 3 Initial conditions for the extended Kalman filter
Table 4 Estimated measurement noise and process noise as well as measurement model for the EKF

The Matlab code as well as the measured off- and online data of this example can be found in the appendix.

4.6 Results

In Fig. 4 the online and offline measured values of ethanol, the offline measured values of biomass and glucose as well as all the Kalman filter estimated values of all three bioprocess variables can be seen.

Fig. 4
figure 4

Online and offline values for biomass, glucose and ethanol as well as EKF estimates for these values

Figure 4 indicates the typical diauxic growth pattern of baker’s yeast on glucose is obtained. First the glucose is consumed and biomass and ethanol are produced, then ethanol is converted to biomass. The offline measurements and its corresponding estimated values fit quite well together as can be seen in Table 5.

Table 5 Prediction error of EKF values compared to offline measurements

The root mean squared error of prediction (RMSEP) of glucose is 0.12 g L−1. The ethanol offline values during glucose consumption are mostly higher than the online measured and the predicted ones; in overall their RMSEP is 0.14 g L−1. All ethanol online measurements seems to be a little bit shifted in time compared to the offline values, which might indicate the time delay due to gas transport from the fermentation broth through the headspace of the reactor to the measurement system. The biomass has a RMSEP of 0.12 g L−1, but the highest deviation can be seen shortly after ethanol is used as substrate. The values shortly before ethanol consumption might not be predicted accurately, because the model describing the switching from glucose to ethanol might be suboptimal.

In order to investigate the influence of the measurement frequency on the performance of the EKF, we decreased the measurement frequency of the online ethanol measurements to one per hour. The results of the estimated values with the EKF are presented in Fig. 5.

Fig. 5
figure 5

Online (every 1 h) and offline values for biomass, glucose and ethanol as well as EKF estimates for these values

Still the overall behaviour of the estimated values is the same. However, the sampling frequency has an influence on the corrections of the estimated state during filtering. Larger step changes are observed in the estimated values whenever a new measurement is available. However, even if the sampling frequency is changed to one per hour, the overall behaviour is predicted well.

Obviously with a higher sampling frequency, these step changes are smaller. Nevertheless, with a 5 min sampling time, the EKF was able to follow the true states of the system with a reasonably small error. More detailed information about the influence of the sampling frequency on the accuracy of the Kalman filter estimates can be found in literature [71, 72].

The EKF was also used for predicting the specific growth rates and their maximum values.

In Fig. 6 the estimated maximum specific growth rates with respect to glucose μmax, G and ethanol μmax, E as well as specific growth rates itself (μG and μE for glucose and ethanol respectively) are presented.

Fig. 6
figure 6

Estimated maximum specific growth rates with respect to glucose μmax, G and ethanol μmax, E as well as the specific growth rates (μG and μE for glucose and ethanol, respectively)

After inoculation, the specific growth rate and its maximum value with respect to glucose are increasing from 0.14 h−1 to more than 0.18 h−1. However shortly thereafter they decrease again. This indicates the high sensitivity of the estimation values due to the measurement noise variance R and the process noise variance with respect to μmax, G, which is Q [4]. The smaller the R and the higher the Q [4], the more the estimated values will rely upon the measurements and as a consequence the filtered values might be changed, if the measured and estimated values deviate from each other. The more glucose is consumed, the larger will be the difference of μmax, G and μG, due to the Monod growth kinetics. If the glucose is almost depleted, the extension to the Monod model on ethanol contributes to increasing growth on ethanol. Shortly after 2 h cultivation time, the transition from glucose to ethanol as substrate takes place. The maximum specific growth rate on ethanol μmax, E, which has not changed during the growth on glucose starts to increase. According to the typical Monod behaviour, before ethanol is depleted, due to the low substrate concentration, 𝜇max,E should be almost constant while 𝜇E should be increasing. However this is not observed in Fig. 6 which is due to the fluctuation of the measured and estimated ethanol concentration.

5 Conclusion

In this chapter, the working principles as well as an overview of Kalman filter applications for state and parameter estimation in bioprocesses has been presented. Regarding the type of the Kalman filter, since most biotechnical processes are non-linear, non-linear versions of the Kalman filter, specifically the EKF, are the most applied algorithm among other extensions of the Kalman filter. However the UKF is getting attention in recent years. The results in literature indicate that the UKF algorithms deliver more accurate estimates of the parameters and state variables compared to EKF algorithms.

In spite of the apparent success of Kalman filters for state and parameter estimation in lab-scale bioreactors, the integration of Kalman filters into industrial systems is not very widespread while most of the process models mentioned in literature consider noise-free ideal fermentations, whereas production-scale operations are corrupted by concentration gradients and disturbance. Accordingly, more efforts are required towards performing simulation studies in order to model and validate proper mathematical models associated with complex non-ideal bioprocesses.

Despite the numerous examples on state estimation methods for biotechnological processes in literature, the research on implementing Kalman filters for state estimation in downstream processing remain rather limited. The advancement in state and parameter estimation methods in downstream processes leads to better knowledge of the location and concentration of the product and key contaminants, which are essential for process optimization and control.

So far most of the Kalman filter algorithms are implemented for monitoring fed-batch cultivations; however more attention is required for real-time implementation of the Kalman filter algorithms for controlling the feed rate and substrate production in these cultivations. Further efforts are also required towards implementation of state estimation methods in batch and continuous cultivations.

From the presented literature, it could be concluded that the non-linear extensions of the Kalman filter are powerful tools for state estimation in bioprocesses; therefore they could be used for digitalization of bioprocesses. Accordingly, in a case study, a digital twin of the baker’s yeast batch fermentation process was developed by using a dynamic non-linear model of the process as well as an EKF algorithm. The proposed method gives the possibility to predict glucose, ethanol and biomass concentrations simultaneously from the only available infrequent online measurements of ethanol concentration. The accuracy of the estimated biomass and substrate production are in line with other studies which have also implemented an EKF algorithm for monitoring the baker’s yeast cultivation [32, 49]. However, in our application the maximal specific growth rates on glucose and ethanol are also estimated. As a consequence, the rapid and precise estimation of these variables could increase the overall knowledge integration in the digital twin of the process.

Overall, the unique advantage of online monitoring and in general digital twins of bioprocesses is that they could play critical roles in bioprocess development such as supporting problem solving in manufacturing, reducing effort in setting up a control strategy and accelerating process performance by taking corrective actions automatically and in real time.