1 Introduction

Bioprocesses are the most complex in all fields of process engineering due to high dimension, nonlinearity and dynamical characteristics. Ability to control bioprocesses accurately and automatically at optimal states is of considerable interests to many fermentation industries because it can reduce production costs and increase yield while at the same time maintaining the quality of the metabolic product [1, 2, 3, 4]. However, designing a control system for bioprocesses is not a straightforward task due to: (i) significant uncertainty in the model, (ii) lack of reliable on-line sensors which can accurately detect the key state variables, (iii) nonlinear and time varying nature of the process, and (iv) slow response of the process, in particular for cell and metabolic concentrations. This work is mainly focused on the second above mentioned problem. Throughout the years, the key state variables, such as biomass concentrations in fermentation processes are usually measured off-line with long measurement delay. This limits the range of control algorithms that can be applied to the process. Recently, artificial neural networks (ANNs) have drawn considerable attention to the development of on-line soft sensors [5, 6, 7, 8].

ANNs are computational systems whose architecture and operation are inspired from our knowledge about biological neural cells (neurons) in the brain. These are not simulation of real neurons in the sense that they do not model the biology, chemistry, or physics of a real neuron. They do, however, model several aspects of the information combining and pattern recognition behaviours of real neurons in a simple yet meaningful way. Neural networks have an incredible capability for emulation, analysis, prediction, and association. They are able to solve difficult problems in a way that resembles human intelligence. What is unique about neural networks is their ability to learn by example. However, ANNs can and should be retrained on or off-line whenever new information becomes available. Recurrent neural networks (RNNs), a member of the ANN family, have proven to be a valuable tool and are extensively used in modelling and control of nonlinear dynamic systems [9, 10, 11, 12].

Apart from the selection of neural networks, another important issue is the selection of appropriate state variables to be measured online. Soft sensors work in a manner of cause and effect; the inherent biologic relation between measured and unmeasured states could significantly affect the prediction accuracy. Carbon dioxide, pH, ethanol and dissolved oxygen (DO) can be easily measured online in a research laboratory using standard sensors. Among them, dissolved oxygen, which reflects the fundamental level of energy transduction in bioreaction, is intricately linked to cellular metabolism. Nor et al. [13] studied the online application of dissolved oxygen and mass balances used to estimate the specific growth rate of a fed-batch culture of Kluyveromyces fragilis. Dissolved oxygen was also employed in [14] to detect the Acetate formation in Escherichia coli. In [15], neural networks have been used to relate the increase of biomass concentration to the increase of lactic acid concentration. This approach requires an additional sensor for measuring the lactic acid concentration.

This paper investigates the suitability of using RNNs to predict on-line biomass concentrations in a fermentation process. RNN input variables are feed rate, liquid volume and dissolved oxygen. All inputs are chosen because they can easily be measured on-line. Output of the RNN is the biomass concentration. Selection of a suitable RNNs topology is done by data generated from a mathematical model; the topology is then re-trained by experimental data. The layout of the remainder of the paper includes determination of a soft sensor structure in Sect. 2, experimental results in Sect. 3, and conclusions in Sect. 4.

2 Soft sensor structure determination and implementation

RNNs are chosen to estimate the biomass because of their strong capability to capture dynamic information underlying the input-output data pairs. The structure of the proposed neural soft sensor is given in Fig. 1. This neural network consists of one hidden layer, one output neuron, feedforward paths, feedback paths and tapped delay lines (TDLs). In order to enhance dynamical behavior of the sensor, outputs from the output layer and the hidden layer are fed back through TDLs. The output of the ith neuron in the hidden layer is of the form:

Fig. 1.
figure 1

Structure of the proposed neural soft sensor

$$h_{i} {\left( t \right)} = f_{h} {\left( {{\sum\limits_{j = 0}^{n_{a} } }W^{I}_{{ij}} p{\left( {t - j} \right)} + {\sum\limits_{k = 1}^{n_{b} } }W^{R}_{{ik}} \ifmmode\expandafter\hat\else\expandafter\^\fi{y}{\left( {t - k} \right)} + \cdots + {\sum\limits_{l = 1}^{n_{c} } }W^{H}_{{il}} h_{l} {\left( {t - 1} \right)} + b^{H}_{i} } \right)}$$
(1)

where p is the neural network input, \(\ifmmode\expandafter\hat\else\expandafter\^\fi{y}\) is the neural network output and h is the hidden neuron’s output. \(b^{H}_{i} \) is the bias of the ith hidden neuron. n a , n b , n c are the number of input delays, the number of output feedback delays and the number of hidden neurons, respectively. f h is a sigmoidal function, \(W^{I}_{{ij}} \) is the weight connecting the jth delayed input to ith hidden neuron, \(W^{R}_{{ik}} \) is the weight connecting the kth delayed output feedback to ith hidden neuron, and \(W^{H}_{{il}} \) is the weight connecting the lth hidden neuron output feedback to the ith hidden neuron.

Only one neuron is placed in the output layer, so the output is:

$$\ifmmode\expandafter\hat\else\expandafter\^\fi{y}{\left( t \right)} = f_{Y} {\left( {{\sum\limits_{m = 1}^{n_{c} } }W^{Y}_{m} h_{m} {\left( t \right)} + b_{Y} } \right)}$$
(2)

where, f Y is a pure linear function, \(W^{Y}_{m} \) is the weight connecting the mth hidden neuron’s output to the output neuron, and b Y is the output neuron bias.

A mathematical fermentation model given in [16] consists of six differential equations and is used to generate simulation data. Four different feed rate profiles are chosen to excite the mathematical fermentation model: the random step, the square wave, the saw wave and an industrial feeding policy-like sequence. They are shown in Fig. 2. Each feed rate profile yields 150 input-output (target) pairs corresponding to 6 min sampling time during a 15 hr fermentation.

Fig. 2.
figure 2

Four different feed rate profiles

A well-known fact for choosing the training data set is that it must cover the entire state space of the system as many times as possible. In this study, the random step, which excites the process the most, is used to generate the training data set. Before training an RNN, the inputs and target data have to be pre-processed (scaled) so that they are within a specified range [-1,1]. This specified range is the most sensitive area of the hidden layer activation function. In this case, the output of the trained network will also be in the range [-1,1].

The performance function used for training and testing the neural networks is a percentage mean square error index [15], and is defined as:

$$E = {\sqrt {\frac{{{\sum\nolimits_{t = 1}^N }{\left( {X^{m}_{t} - {\mathop {\ifmmode\expandafter\hat\else\expandafter\^\fi{X}}\nolimits_t }} \right)}^{2} }}{{{\sum\nolimits_{t = 1}^N }{\left( {X^{m}_{t} } \right)}^{2} }}} } \times 100\% $$
(3)

where N is the number of sampling data pairs, \(X^{m}_{t} \) is the measured (actual) value of biomass, and \({\mathop {\ifmmode\expandafter\hat\else\expandafter\^\fi{X}}\nolimits_t }\) is the corresponding estimated value predicted by the neural soft sensors.

The Levenberg-Marquardt backpropagation training algorithm is adopted to train the neural networks due to its faster convergence and memory efficiency [17, 18]. The algorithm can be summarised as follows:

  1. 1.

    Present input sequences to the network. Compute the corresponding network outputs with respect to the parameters X k (i.e. weights and bias), the error e and the overall MSE.

  2. 2.

    Calculate the Jacobian matrix J through the backpropagating of Marquardt sensitivities from the final layer of the network to the first layer.

  3. 3.

    Update the network parameters using:

$$\Delta {\mathbf{X}}_{k} = - {\left[ {{\mathbf{J}}^{T} {\left( {{\mathbf{x}}_{k} } \right)}{\mathbf{J}}{\left( {{\mathbf{x}}_{k} } \right)} + \mu _{k} {\mathbf{I}}} \right]}^{{ - 1}} {\mathbf{J}}^{T} {\left( {{\mathbf{x}}_{k} } \right)}{\mathbf{e}}$$
(4)

where, μ k is initially chosen as a small positive value (e.g., μ k =0.01).

  1. 4.

    Recompute the MSE using X k X k . If this new MSE is smaller than that computed in step 1, then decrease μ k , let X k+1=X k X k and go back to step 1. If the new MSE is not reduced, then increase μ k and go back to step 3.

The algorithm terminates when: (i) the norm of gradient is less than some predetermined value, or (ii) the MSE has been reduced to some error goal, or (iii) the μ k is too large to be increased practically, or (iv) the predefined maximum number of iterations has been reached. During training, an early stopping method is useful to prevent a neural network from being over-trained. When random step data are used to train the network, another set of data with a different feed rate profile will be used as a validation data set. The error on the validation set is monitored during the training process. The validation error will normally decrease during the initial phase of training. However, when the network begins to over fit the data, the error on the validation set will typically begin to rise. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are obtained.

The rest of the data sets, which are not seen by the neural network during the training period, are used in examining the trained network. For each network structure, 50 networks are trained; the one that produces the smallest mean error of the test data sets is retained. The error between the network output and the target output is used to evaluate the “goodness” of the network. Errors for various combinations of input and output delays (hidden layer output feedback delay is fixed at 1) are shown in Fig. 3 from which can be seen that 12 hidden neurons networks frequently out-perform the 6 hidden neurons networks. The errors produced by the 0/4/1 structure (four output feedback delays, one hidden layer output feedback delay and no input delays) are smaller than others, and are very close for two different numbers of hidden neurons. The six hidden neurons network with the above structure is chosen for the online biomass estimation because it has less hidden neurons.

Fig. 3.
figure 3

Estimation mean error on testing data sets for neural networks with different combinations of delays. ‘0/4/1’ indicates that the number of input delays=0, the number of output feedback delays=4, the number of hidden layer output feedback delays=1

One of the simulation results is plotted in Fig. 4. The soft sensor provides a good prediction of the growth of biomass. The percentage mean square error of prediction is less than 3%. It should be mentioned that this prediction is based on the assumption that data sets for training, validation and testing are free from noise. However, this assumption is not true in real environments. Practically, errors in measurement, noisy input and output data may affect the accuracy of biomass estimation.

Fig. 4.
figure 4

Simulation result of biomass prediction using 6 hidden neuron network for a fed-batch fermentation process

3 Experimental results

Yeast strain, Saccharomyces cerevisiae, produced by Goodman Fielder Milling and Baking N.Z. Ltd. was grown in a YEPD medium [19] with the following composition: Dextrose 20 g/L, Yeast extract 10 g/L, Peptone 20 g/L, and commercial anti-foam 10 drops/L. Starter culture was performed in the shaker at 30°C and 200 RMP for 60–90 min.

Three laboratory experiments with different feed rates (Fig. 2) have been carried out using 3-liter fermentors (New Brunswick Scientific Co., Inc., USA); see Fig. 5. These experiments are used to examine the suitability of the proposed soft sensor for real fermentation processes. Three sets of data are collected, 27 samples are taken during an 8 hr fermentation for each run. One set of data is used for re-training the neural network, one is used for validation and one is used for testing. Linear interpolation is used to make the sampling time equal.

Fig. 5.
figure 5

Experimental setup

One of the prediction trajectories is presented in Fig. 6. The network starts from an arbitrary initial point. As can be seen from Fig. 6, the soft sensor is able to converge within a very short time and can predict the trend of the growth of biomass, but it gives a high fluctuation in biomass estimation. In order to overcome this problem, two additional delayed inputs are added to the proposed network (RNN obtained in the previous section). As shown in Fig. 7, a smooth prediction has been achieved. However, the error is slightly higher than that in Fig. 6. By incorporating the hidden layer feedback (with unit delay, see Fig. 1) to the input of the network, the prediction mean error is reduced to 10.3%. The biomass estimation is given in Fig. 8. The experimental results show that the smallest percentage mean error is obtained from the neural soft sensor with four output feedback delays, one hidden layer output feedback delay and two input delays.

Fig. 6.
figure 6

On-line biomass prediction in a fed-batch baker’s yeast fermentation process with zero network input delays, four output feedback delays, one hidden layer output feedback delay

Fig. 7.
figure 7

On-line biomass prediction in a fed-batch baker’s yeast fermentation process with two network input delays, four output feedback delays, and no hidden layer output feedback delays

Fig. 8.
figure 8

On-line biomass prediction in a fed-batch baker’s yeast fermentation process with two network input delays, four output feedback delays, and one hidden layer output feedback delay

4 Conclusion

This paper has investigated the suitability of using recurrent neural networks to predict biomass concentrations online. Inputs to the proposed recurrent neural network are the feed rate, liquid volume and dissolved oxygen. A suitable topology of the neural network is obtained via data generated by a mathematical model. The suitable topology of the RNN is then evaluated via experimental data. From the results obtained in both simulation and real processes, it can be concluded that RNNs are indeed a powerful tool for implementing an on-line biomass soft sensor for fermentation processes. The proposed neural network is able to predict the biomass concentration within 11% of its true value.