Introduction

Currently, ethanol is the main substitute for gasoline and can be obtained from alcoholic fermentation [1,2,3]. The main raw materials used for the industrial production of ethanol are corn and sugarcane. Other sources of biomass have also been studied and used, such as barley and wheat. However, the increase in world demand encourages the search for other alternative raw materials. In this context, some authors evaluated the potential of cashew apple juice as culture medium [4,5,6,7,8], and ethanol was produced in a laboratory scale at high yields and productivities.

In these processes, yeasts are widely used and Saccharomyces cerevisiae, which naturally evolved to efficiently consume sugars such as sucrose, is one of the most important cell used in industrial ethanol production due to its robustness, stress tolerance, genetic accessibility, simple nutrient requirements [9]. Moreover, it is one of the most studied yeast by the scientific community [9,10,11]. In this work, however, a flocculent S. cerevisiae was evaluated aiming to eliminate the cost of separation of cells.

The modified S. cerevisiae (FLO5α gene) tends to form small flocs that decant on the bottom of the fermenter at the end of reaction. This characteristic allows the microorganism to be easily separated from the fermented medium, which reduces process costs once centrifugation becomes unnecessary [12]. Nevertheless, to favor substrate diffusion into the cell or inside the flocs, the stirring speed is essential to avoid that the cells assume a flocculated state during fermentation [13,14,15]. Therefore, the influence of stirring speed in the reaction medium containing the flocculating S. cerevisiae should be considered as a fermentation parameter [7].

Modeling of real fermentation processes presents a high degree of complexity due to genetic characteristics, physicochemical and biochemical laws involved, besides the non-linearity of its kinetics [16]. Biochemical reactions involve many stages of multiple reactions (in series and parallel) and depend on several transport phenomena that may limit the observed reaction rates [17].

In general, the rigorous approach of the involved processes can be difficult to apply in the kinetic modeling due to inherent non-linearity, lack of information and experimental inaccuracy, as well as deviations from ideal conditions [18]. Thus, finding a faster and simpler way to describe fermentative processes may be more advantageous. Hybrid modeling emerges as an alternative to combine prior knowledge of the process through mass balances with artificial neural network (ANN) that describe the unknown kinetics of the process. Several authors have already proposed the hybrid modeling strategy in their studies and concluded that they are reliable [17, 19,20,21].

Mathematical model allows to optimize the physical–chemical parameters that influence the general productivity of the process. In this study, the operating conditions such as temperature, stirring speed, initial cell and substrate concentrations play a synergistic role in controlling cell growth and ethanol production. Therefore, the use of advanced modeling and optimization tools, including artificial neural network (ANN) and particle swarm optimization technique (PSO), was proposed. Those algorithms have been found to be more efficient than other statistical optimization techniques in deriving global optimal solutions for complex and non-linear bioprocesses [22,23,24,25].

PSO is a non-deterministic bio-inspired population optimization method and can be applied to optimize non-linear and non-continuous problems with multivariable [26]. It is based on a constructive method to obtain the initial population and a local search technique to improve the solution of the population. With this intention, the individuals (solutions) of this population should evolve according to specific rules that consider the exchange of information among the individuals, leading the population to an optimal solution [27]. Compared with other evolutionary algorithms, PSO has some advantages such as ease implementation, better efficiency, less memory requirement, and constructive cooperation between individuals. Therefore, it is more likely and quicker to “flock” into the better solution areas and discover the optimal results much faster [26, 28].

The originality of this work consists in the use of an HNM-PSO combined strategy to optimize the operational conditions, creating a faster, simpler and more efficient alternative to a mechanistic model. Therefore, a hybrid neural model (HNM) was proposed to represent the alcoholic fermentation of cashew apple juice by a flocculating S. cerevisiae, including the influence of cell and substrate concentration, as well as temperature and stirring speed. Then, a combined HNM-PSO algorithm was implemented to maximize ethanol production by changing operational conditions.

Materials and methods

Experimental data

Ethanol was produced by Saccharomyces cerevisiae CCA008 (with the modified gen FLO5α) using cashew apple juice (750 mL of medium at pH 4.5) in a 1 L bench-scale bioreactor (Tec-Bio, Model 1.5, Tecnal, SP, Brazil). All experiments were performed in duplicate as reported by Pinheiro et al. [29].

The operating conditions, efficiency and productivity of the fermentative process are summarized in Table 1. The initial concentration of substrate was varied from 70 to 170 g L−1, temperature from 26 to 42 °C, initial cell concentration from 4 to 10 g L−1 and stirring speed from 80 to 800 rpm with a processing time of 10 h.

Table 1 Operational conditions of batch fermentation to produce ethanol by Saccharomyces cerevisiae CCA008 using cashew apple juice as substrate

Data processing

Regressions, interpolations, and normalization of experimental data were performed to increase the number of points available per intermediate points prediction. This data processing was necessary, since the ANN algorithms require a large amount of data. Moreover, mathematical model calibration and neural networks training should avoid bias in the data, due for example to experimental measurement error. Therefore, Boltzmann’s regression model [30] (Eq. 1) was selected, because it has the necessary functional features to fit the behavior of curves (inflection points and asymptotes). The least squares method was applied to determine the function parameters for substrate, cell and product concentration profiles at each operational condition. The coefficient of determination (R2) was employed aiming to certify the quality of each fitting, varying between 0.97 and 1.00:

$$y = \frac{{A_{1} - A_{2} }}{{1 + e^{{\frac{{\left( {x - x_{0} } \right)}}{dx}}} }} + A_{2}$$
(1)

The functions were interpolated in intervals of 30 min, to quadruple the data intended for ANNs training. Thus, specific rates of glucose consumption, cell growth and ethanol production were estimated, calculating the derivative of the correspondent Boltzmann’s equation for the 19 assays, according to Eqs. 24:

$$\mu_{X} \left( t \right) = \frac{{\frac{dX\left( t \right)}{dt}}}{X\left( t \right)}$$
(2)
$$\mu_{S} \left( t \right) = \frac{{\frac{dS\left( t \right)}{dt}}}{X\left( t \right)}$$
(3)
$$\mu_{P} \left( t \right) = \frac{{\frac{dP\left( t \right)}{dt}}}{X\left( t \right)}$$
(4)

In summary, the implemented algorithm (Fig. 1) to treat the data requires as input the experimental dataset, including duplicates, a function for regression (Boltzmann’s regression model) and a time interval in which the data points will be interpolated. As output, the concentration and specific rate profiles are obtained, as well as the interpolated data for both, also called pseudo-experimental data.

Fig. 1
figure 1

Algorithm diagram implemented for data processing

Subsequently, the pseudo-experimental data must be normalized before applying the artificial intelligence methods, since there is a significant improvement in the data distribution. It numerically corresponds to adequate the order of magnitude of different variables that can be very diverging in magnitude. In this work the min–max normalization was adopted, Eq. 5, with unitary interval:

$$z_{i}^{n} = \frac{{z_{i} - z_{i}^{min} }}{{z_{i}^{max} - z_{i}^{min} }}$$
(5)

Neural network development

Figure 2 shows the typical structure of the neural networks developed, with input and output data fed for training, consisting of three types of layers: input, hidden and output. The interconnection between the neurons in each layer is defined by weights and biases. ANN learns the cause–effect relationship between input and output variables of the given dataset, updating their weights so that the error between the given data and the simulated output is minimized.

Fig. 2
figure 2

ANN typical structure

In this work, the described neural networks were developed by an iterative procedure implemented in Python/IPython Notebook version 2.7.8 language associated with PyBrain library for machine learning with backpropagation trainer.

Of the 19 operational conditions, 01 experiment was randomly chosen for validation (assay 13) and 18 experiments were used for train/test the ANN: the experimental data were randomly split into two groups, reserving 75% of data to the training phase and the remaining 25% to test the neural networks.

The architecture of the ANN (number of nodes in each layer and the number of hidden layers) was defined by a trial-and-error procedure. For that, the coefficient of determination (R2), as well as the maximum and average error between experimental and predicted data by ANN was observed.

Mathematical modeling

Hybrid neural model (HNM)

The fermentative models consist of a set of differential equations obtained by combining batch reactor mass balances by component (cell, substrate and product) to specific rates of reaction (μ). The simplifying hypotheses for the mathematical are:

  1. i.

    All cells in the fermentative medium were viable;

  2. ii.

    Perfecting mixing system, justified by the presence of a mechanical stirring device;

  3. iii.

    Isothermal, since the bioreactor was equipped with a temperature control system;

  4. iv.

    Constant reaction volume;

  5. v.

    Substrate consumption for the cell maintenance was neglected.

$$\frac{dX}{dt} = \mu_{X} . X$$
(6)
$$\frac{dS}{dt} = - \mu_{S} . X$$
(7)
$$\frac{dP}{dt} = \mu_{P} . X$$
(8)

In this case, the HNM proposed for the alcoholic fermentation of cashew apple juice by S. cerevisiae CCA008 combines mass balances with ANNs. The ANNs work as estimators for the specific rates of cells growth, substrate consumption and production formation. Thus, three networks were created: ANN1 (\(\mu_{X}\)) ANN2 (\(\mu_{S}\))nd ANN3 (\(\mu_{P}\)).

For better performance in the ANNs training step, the input and output layers were fed with normalized data (Eq. 9), assigning the same weight for each input variable:

$$\mu^{n} = f\left( {X^{n} , T^{n} ,S^{n} ,N^{n} } \right)$$
(9)
$$ANN = f\left( {\frac{{X - X^{min} }}{{X^{max} - X^{min} }}, \frac{{T - T^{min} }}{{T^{max} - T^{min} }},\frac{{S - S^{min} }}{{S^{max} - S^{min} }},\frac{{N - N^{min} }}{{N^{max} - N^{min} }}} \right)$$
(10)

As the specific rates were normalized (ANNs), an algebraic manipulation between normalization function (Eq. 11) and Eqs. 6 to 8 was necessary, resulting in the hybrid model presented in Eqs. 1115:

$$\mu^{n} = \frac{{\mu - \mu^{min} }}{{\mu^{max} - \mu^{min} }}$$
(11)
$$\mu = \mu^{n} .\left( {\mu^{ma x} - \mu^{min} } \right) + \mu^{min}$$
(12)
$$\frac{dX}{dt} = \left[ {ANN1 .\left( {\mu_{X}^{ma x} - \mu_{X}^{min} } \right) + \mu_{X}^{min} } \right] .X$$
(13)
$$\frac{dS}{dt} = - \left[ {ANN2 .\left( {\mu_{S}^{ma x} - \mu_{S}^{min} } \right) + \mu_{S}^{min} } \right] . X$$
(14)
$$\frac{dP}{dt} = \left[ {ANN3 .\left( {\mu_{P}^{ma x} - \mu_{P}^{min} } \right) + \mu_{P}^{min} } \right] . X$$
(15)

HNM implementation

The implemented HNM is schematized as reported in Fig. 3. Initially, the operational conditions are specified to calculate initial specific rates of reaction (ANN1, ANN2 and ANN3). Next, the HNM is resolved by a combination of ANNs previously trained and the developed mathematical model, combined with two mechanistic conditions to guarantee the physical meaning of the model:

Fig. 3
figure 3

HNM implementation flowchart

  1. i.

    Maximum theoretical yield for ethanol production is 0.511 gethanol/gglucose, due to stoichiometry.

  2. ii.

    No reaction takes place in absence of substrate (\({\raise0.7ex\hbox{${dX}$} \!\mathord{\left/ {\vphantom {{dX} {dt}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${dt}$}} = 0\), \({\raise0.7ex\hbox{${dS}$} \!\mathord{\left/ {\vphantom {{dS} {dt}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${dt}$}} = 0\) and \({\raise0.7ex\hbox{${dP}$} \!\mathord{\left/ {\vphantom {{dP} {dt}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${dt}$}} = 0\)).

Thus, the specific rates can be estimated as a function of the fermentative medium conditions (substrate and cells concentrations, temperature and stirring speed) for each instant of the reaction. The new conditions are the responses of the ODE system solution in the instant \(t + \Delta t\) until it reaches the time determined to end the reaction and attainment of the concentration profiles of biomass, substrate and product.

Accuracy assessment

The precision quality of HNM was evaluated by statistical analysis, as follows: residual standard deviation—RSD (%) and modified F test. The RSD suggested by Cleran et al. [31], as seen in Eq. 16, was used to assess the quality of the prediction models:

$$RSD = \sqrt[{}]{{\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i}^{exp} - y_{i}^{cal} } \right)^{2} }}{n}}}$$
(16)

The modified F test is a way to discern models by calculating the variance of the error between experimental data and theoretical data [32]. This hypothesis test can be used to verify the adequacy of a mathematical model, when the average experimental error of the data is greater than the apparent experimental error calculated by the model (\({\mathcal{E}}^{exp} > {\mathcal{E}}^{apparent}\)). With the Eqs. 17 and 18, it is possible to estimate the apparent experimental error (Eq. 19):

$$S_{m}^{2} = \frac{{\mathop \sum \nolimits_{j}^{n} \left[ {\left( {X_{j}^{exp} - X_{j}^{cal} } \right)^{2} + \left( {S_{j}^{exp} - S_{j}^{cal} } \right)^{2} + \left( {P_{j}^{exp} - P_{j}^{cal} } \right)^{2} } \right]}}{3n - p}$$
(17)
$$S_{e}^{2} = \frac{{{\mathcal{E}}^{2} \mathop \sum \nolimits_{j}^{n} \left[ {\left( {X_{j}^{exp} } \right)^{2} + \left( {S_{j}^{exp} } \right)^{2} + \left( {P_{j}^{exp} } \right)^{2} } \right]}}{{3n - n_{v} }}$$
(18)
$${\mathcal{E}}^{exp} > \sqrt[{}]{{\frac{{S_{m}^{2} .\left( {3n - p} \right)}}{{F_{tab} .\mathop \sum \nolimits_{j}^{n} \left[ {\left( {X_{j}^{exp} } \right)^{2} + \left( {S_{j}^{exp} } \right)^{2} + \left( {P_{j}^{exp} } \right)^{2} } \right]}}}}$$
(19)

Optimization method

Particle swarm optimization—PSO

In this work, the optimization aims to determine the optimal operational conditions for the alcoholic fermentation of cashew apple juice, that maximizes efficiency and ethanol productivity. For that, PSO was used combined to the HNM.

PSO uses a swarm population, where each individual within the swarm is denominated particle. According to Jiao, Lian and Gu (2008) [33], a particle \(i\) in an interaction \(k\) moves through the search space with two attributes [33]:

  • The current position within the search N-dimensional space \(X_{i}^{k} = \left( {x_{1}^{k} , \ldots ,x_{n}^{k} , \ldots ,x_{N}^{k} } \right)\) of the problem, with \(x_{n}^{min} \le x_{n}^{k} \le x_{n}^{max}\) for each \(n \in \left[ {1, N} \right]\), where \(x_{n}^{min}\) and \(x_{n}^{max}\) are the limits of coordinate \(n\)

  • Its speed is vectorially represented by \(V_{i}^{k} = \left( {v_{1}^{k} , \ldots ,v_{n}^{k} , \ldots ,v_{N}^{k} } \right)\) in the same N-dimensional space of the problem.

After each iteration the speed and position of all particles are updated according to the two best values found during the search. The first one is calculated by the PSO regarding each individual best value found during its lifetime, pbest. The other one is calculated considering the best value of ensemble of points, swarm, named as gbest. After finding the two best values, the position and speed of the particles are obtained by Eqs. 20 and 21:

$$V_{i}^{k + 1} = w . V_{i}^{k} + c_{1} .r_{1} .\left( {pbest_{i}^{k} - X_{i}^{k} } \right) + c_{2} .r_{2} .\left( {gbest^{k} - X_{i}^{k} } \right)$$
(20)
$$X_{i}^{k + 1} = X_{i}^{k} + V_{i}^{k + 1}$$
(21)

PSO implementation

The first step for PSO implementation is to define the control variables and the objective function to be used for maximization.

  1. i.

    Control variables: cells and substrate initial concentrations, temperature and stirring speed.

  2. ii.

    Objective function: efficiency and productivity.

The objective function associates each point of the solutions space to a real number that allows measuring the response quality towards the initial objective. Perceiving that the individual function analysis would not be sufficient to determine the optimal operation point, the problem now becomes multiobjective. To simplify the optimization, the association of objective functions method was chosen, applying a geometric mean between efficiency and productivity (F2). Equations 22, 23 and 24 show the objective functions for efficiency (F1), productivity (F2) and the geometric mean (F3) for both, respectively:

$$F_{1} = max\left( {\frac{{P_{f} }}{{0.511 . \left( {S_{0} - S_{f} } \right)}}} \right)$$
(22)
$$F_{2} = max\left( {\frac{{P_{f} }}{{t_{f} }}} \right)$$
(23)
$$F_{3} = max\left( {\sqrt[{}]{{\left( {\frac{{P_{f} }}{{t_{f} }}} \right) .\left( {\frac{{P_{f} }}{{0.511 . \left( {S_{0} - S_{f} } \right)}}} \right)}}} \right)$$
(24)

Due to the reaction stoichiometry, the theoretical yield (YP/S) for the ethanol production is 0.511 gethanol/gglucose+fructose, which limits the efficiency obtained by the HNM, avoiding non-realistic results. Thus, Eqs. 25 and 26 represent the inequality constraints used in the optimization:

$$S_{f} \ge 0$$
(25)
$$max\left( {\frac{{P_{f} }}{{S_{0} - S_{f} }}} \right) < 0.511$$
(26)

Since the objective functions and inequality constraints are defined, lateral restrictions of the search space of interest variables are defined and presented in Table 2.

Table 2 Side bounds of the search space

The optimization algorithm is illustrated in Fig. 4, in which the PSO input parameters are necessary and the limits of the search space. Then, the HNM is solved (ODE system combined with ANN) and the objetive functions are calculated for optimal fermentation time \(\left( {dP/dt = 0} \right)\) testing the inequality constraints when necessary. Finally, when the objective function is maximized, the optimal operational conditions for the process are shown.

Fig. 4
figure 4

Process optimization flowchart

PSO algorithm used was obtained from PySwarms library and the optimization algorithm was implemented in Python/IPython Notebook version 2.7.8. The performance parameters used by the PSO algorithm were chosen to reconcile the simulation time and computational cost with the quality of the predictions and the parameters are shown in Table 3.

Table 3 PSO parameters

Results and discussion

Mathematical modeling

The kinetic model was determined in accordance with the architectures shown in Table 4. The specific rates of cells growth (ANN1), substrate consumption (ANN2) and product formation (ANN3) were estimated as a function of the instant concentration of substrate and cells, temperature and stirring speed. Several training rounds were conducted with different neural network architectures (1 to 4 hidden layers, 5 to 30 neurons per layer) of symmetrical type.

Table 4 ANNs architecture

As it can be seen in Fig. 5, the simulated data are randomly spread in relation to the bisector line. These results indicate that the values predicted by the neural networks are satisfactory to represent the specific rates, as shown in Table 5.

Fig. 5
figure 5

ANNs simulations: Comparison between RNA predictions and pseudo-experimental data (circles); (—) Bisector line (—) Specific error of 10%

Table 5 Maximum and average errors of neural networks

The maximum errors of the ANN are higher than the 10%, which is the standard error value accepted in bioprocess [34]. However, the amount of data above this limit is irrelevant against the total set. Moreover, all the mean deviations calculated were below 2% (Table 5), indicating the quality of the ANN’s prediction.

Table 6 shows the statistical analysis for the ANNs learning process, where the RSDs values represent the standard deviation of the data in comparison to the model. The apparent error calculated for the process was \(\varepsilon_{exp} > 8.37\%\), proving that the HNM is a more direct and efficient alternative to represent the process of ethanol production by S. cerevisiae CCA008 using cashew apple juice as substrate when compared to the mechanistic model previously reported by our group [7].

Table 6 HNM statistical analysis. The RSDs values represent the standard deviation of the data in comparison to the model. εexp is the experimental error

To verify the adequacy of the model to the experimental data used in the training of the ANN, experimental and simulated data for assay 4 (see operational conditions in Table 1) were compared (Fig. 6). The model is capable to describe the behavior of the fermentation, since the model not only represent the experimental points but also is contained in the confidence interval of the duplicates, with 90% significance level.

Fig. 6
figure 6

Experimental and simulated data for assay 4: (closed circles) cell concentration (g L−1); (closed square) substrate concentration (g L−1); (closed triangles) ethanol concentration (g L−1); (—) HNM, (—) Confidence interval with 90% significance level for the experimental data

Model validation

After confirming the HNM statistical adequacy, the next step was the model validation by checking if it fits a new dataset (Assay 13), not included in learning. Figure 7 shows the comparison between the simulated and experimental data of assay 13, where it is possible to see that the model was able to predict the experimental data. Therefore, this simulation represents the general validation of the proposed model, since it tests the capability of the HNM to properly predict the system behavior, as verified on the statistical analysis presented on Table 6.

Fig. 7
figure 7

HNM validation (Assay 13): (closed circles) cell concentration (g L−1); (closed square) substrate concentration (g L−1); (closed triangles) ethanol concentration (g L−1); (—) HNM, (—) Confidence interval with 90% significance level for the experimental data

The hybrid model validation returns higher RSD values and a lower apparent mean error when compared to the training step, reproducing satisfactorily the assays.

The performance of the HNM is comparable to the one obtained by the experimental data used to execute the phases of the training and test. Reasonable values of RSDs and apparent errors (Test F) were obtained, lower than experimental mean errors of the assays for biomass (11.8%), substrate (16.2%) and product (11.9%) concentrations.

Thus, the results obtained show that the ANNs are capable to adequately predict the system behavior, even when operating in non-explored conditions. The HNM developed is particularly useful regarding the control and optimization of processes, providing trustworthy predictions for biotechnological applications.

Optimization

One of the biggest challenges of this work was the optimization of objective multimodal functions deriving from the inherent complexity of the biochemical processes and the number of optimization variables. Due to this, the analysis was restricted to a maximal time of 10 h of fermentation with low stirring speed, but enough to keep the process well mixed. This selection was made to minimize the energy consumption costs of the process. Table 7 presents the maximal values and operation conditions obtained by PSO to each objective function chosen.

Table 7 Maximum values and optimal conditions of objective functions for efficiency (F1), productivity (F2) and the geometric mean between efficiency and productivity (F3)

The optimization of the objective function Efficiency (F1) is in accordance to the statistical model proposed and experimentally validated in a previous work [29], using the Monod model [6, 29]. This fact indirectly validates the optimization model HNM-PSO. In these conditions, the productivity was 6.3 g L−1h−1 after 8 h of reaction.

High values of productivity (F2) were obtained (8.5 g L−1) after 7 h of fermentation with an efficiency of 83%, which is also in accordance with the optimization of the mechanistic model proposed before [7]. Comparing F1 and F2, the initial concentration of substrate and temperature are different. To maximize the productivity, high levels of substrate should be available in the reaction medium, which promotes a higher ethanol concentration. Temperature also improves rates, which has an impact on productivity.

Therefore, to evaluate a combination of productivity and efficiency, a multiobjective function (F3) was proposed. F3 was defined to equilibrate the relation between substrate consumption to produced ethanol and time of reaction. Figure 8 shows a simulation for the ideal condition of fermentation (initial substrate concentration 127 g L−1, temperature 35 °C, initial cells concentration 5.8 g L−1, and stirring speed 111 rpm), achieving an efficiency and productivity of 91.5% and 8.0 g L−1 h−1, respectively, at approximately 7 h of bioprocess.

Fig. 8
figure 8

HNM simulation for objective function optimization F3

The optimal conditions of initial substrate concentration and temperature for function F3 are comprised in the interval of F1 and F2, while the initial cell concentration and stirring remained almost constant. This fact was expected, because the objective function F3 captures the combined effect of the efficiency and productivity parameters.

Conclusion

The HNM approach proved to be a very efficient tool to analyze and simulate the alcoholic fermentation of cashew apple juice by a flocculant yeast (S. cerevisiae CCA008). The algorithm appears as an alternative for biotechnological processes that presents a high level of complexity due to physicochemical and biochemical laws and evolved genetics. In this work, the use of ANN allowed to disregard a complex reaction mechanism. In addition, the HNM presented a high level of generality, allowing this model to be applied to other fermentation processes. Last but not least, the combination of advanced modeling techniques and optimization was successfully applied to maximize efficiency and reaction productivity. Generally speaking, the HMN-PSO optimization technique can be very useful in the optimization of bioprocesses, traditionally non-linear and involving multiple variables.