Introduction

Lactic-acid bacteria (LAB) have been used as starter cultures for numerous varieties of manufactured and commercialized fermented products, including fermented fruits, vegetables, cereal products, dairy products, fermented fish, and meat [26]. LAB can be considered a suitable starter culture in the production of fermented foods due to the ease of fermentation, the risk of fermentation failure, and several functional properties [20]. Yogurt is a fermented milk product, which may be manufactured from products obtained from milk with or without compositional modification (as limited by some provisions), obtained by the action of symbiotic cultures of Streptococcus thermophilus and Lactobacillus bulgaricus, resulting in the reduction of pH with or without coagulation [12].

The LAB growth in the fermentative process results in acid production and the lowering of the milk \(\mathrm{pH}\) from ~ 6.5 to 4.0–4.5 [18]. Industrial processes can have temperature (\(\mathrm{T}\)) changes due to the high scale of the equipment, in which gradients in such critical process control parameters can negatively impact fermentation performance [13]. As \(\mathrm{T}\) and \(\mathrm{pH}\) are factors that affect the LAB growth, modeling the growth dependence of S. thermophilus and L. bulgaricus as a function of \(\mathrm T\) and \(\mathrm{pH}\) is essential to improve the quality of milk-fermented products.

In predictive microbiology, primary models measure the response of the microorganisms as a function of the time to a single set of conditions (e.g., temperature and pH). Secondary models describe the response of the primary model parameters (e.g., lag phase and maximum specific growth rate) to changes in the culture conditions [34]. The estimation of the microbial growth parameters is traditionally performed with the two-step modeling approach, in which the primary and secondary models are sequentially fitted to the data in two steps. On the other hand, growth curves can be analyzed with non-linear regression to estimate the kinetic parameters of primary and secondary models together in the so-called one-step modeling approach, in which the global residual sum of squares of the entire dataset is minimized in one step [15, 23].

The uncertainty about the estimated parameters by one-step or two-step approaches can be assessed by their 95% confidence intervals computed from the model fitting. Additionally, the Monte-Carlo method can be used to robustly quantify the expected uncertainty of the parameters and their confidence region. The distribution of sensitivity values can be estimated by Monte-Carlo analysis by repeatedly sampling from an assumed joint-probability density function of the parameters and by evaluating the sensitivities for each sample [3]. The experimental data and the estimated kinetic parameters can be used as prior information for estimating the posterior distribution of the kinetic parameters and the uncertainty of the predictions [16]. Thus, the Monte-Carlo method has been applied to predict the microbial growth in foods [17, 21].

The reliability of the predictive models must be assessed with statistical indexes by comparing their predictions to observations, particularly in foods. Therefore, the utility of mathematical models to assist in food safety and quality decisions can be evaluated [5]. In this scope, the first stage of validation to propose a model is often an internal validation, e.g., the validation is performed on the same data used for building the model (te [28, 33].

A previous study modeled the growth of L. bulgaricus and S. thermophilus with a simple first-order kinetic model, in which the dependence of the maximum specific growth rate was described with empirical models on the carbon and nitrogen substrates, the temperature, the pH, and the dissociated and undissociated forms of lactic acid [1]. However, the authors did not perform deeper statistical analysis and the model did not predict well the experimental data. Thus, this study aimed to model the growth dependence of S. thermophilus and L. bulgaricus as a function of temperature and pH and to estimate and internally validate their growth parameters and confidence intervals with different modeling approaches.

Material and methods

Experimental data

The experimental data of twenty-four kinetic experiments regarding the growth of S. thermophilus and L. bulgaricus in a prepared medium of whey and yeast extract at constant temperature and pH were kindly provided by Marzieh Aghababaie (personal communication). There are twelve datasets of pure culture of S. thermophilus (temperatures of 34.3 °C, 36.0 °C, 40.0 °C, 44.0 °C, and 45.6 °C; pH of 5.36, 5.70, 6.50, 7.30, and 7.63), as well as twelve datasets of pure culture of L. bulgaricus (temperatures of 38.3 °C, 40.0 °C, 44.0 °C, 48.0 °C, and 49.6 °C; pH of 4.56, 4.90, 5.70, 6.50, and 6.83). An experimental design diagram is shown in supplementary file 1. The experimental data were used in a previous work by Aghababaie et al. [1], in which they were expressed and modeled in biomass base (\(\mathrm X\), in g/L). In the current study, the experimental data were expressed and modeled as the logarithm of the microbial concentration base (\(\mathrm{log}N\), in log CFU/mL).

Fitting the primary and secondary models with the two-step modeling approach

The growth of each culture was described by the explicit form of the Baranyi and Roberts [6] primary model, as presented in Eqs. (1) and (2), in which \(i\) represents each microbial species (\(\mathrm{S}\) for S. thermophilus and \(\mathrm{L}\) for L. bulgaricus,\({\mathrm{y}}^{\mathrm{i}}={\mathrm{logN}}^{\mathrm{i}}\) (log CFU/mL) is the logarithm of the microbial concentration of each \(i\) species at time \(t\) (h); \({\upmu }_{\mathrm{max}}^{\mathrm{i}}\) (h−1) is the maximum specific growth rate of each \(\mathrm{i}\) species; \({\mathrm{y}}_{0}^{\mathrm{i}}={\mathrm{logN}}_{0}^{\mathrm{i}}\) (log CFU/mL) and \({\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}={\mathrm{logN}}_{\mathrm{max}}^{\mathrm{i}}\) (log CFU/mL) are the logarithm of the initial and maximum cell concentrations, respectively, of each \(i\) species; and \({h}_{0}^{i}\) (dimensionless) is related to the physiological state of each \(i\) species.

$${\mathrm{y}}^{\mathrm{i}}\left(\mathrm{t}\right)={\mathrm{y}}_{0}^{\mathrm{i}}+{\upmu }_{\mathrm{max}}^{\mathrm{i}}\mathrm{F}\left(\mathrm{t}\right)-\mathrm{ln}\left(1+\frac{\mathrm{exp}\left({\upmu }_{\mathrm{max}}^{\mathrm{i}}\mathrm{F}\left(\mathrm{t}\right)\right)-1}{\mathrm{exp}\left({\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}-{\mathrm{y}}_{0}^{\mathrm{i}}\right)}\right)$$
(1)
$$\mathrm{F}\left(\mathrm{t}\right)=\mathrm{t}+\frac{1}{{\upmu }_{\mathrm{max}}^{\mathrm{i}}}\mathrm{ln}\left(\mathrm{exp}\left(-{\upmu }_{\mathrm{max}}^{\mathrm{i}}\mathrm{t}\right)+\mathrm{exp}\left(-{\mathrm{h}}_{0}^{\mathrm{i}}\right)-\mathrm{exp}\left(-{\upmu }_{\mathrm{max}}^{\mathrm{i}}\mathrm{t}-{\mathrm{h}}_{0}^{\mathrm{i}}\right)\right)$$
(2)

The four primary model parameters (\({\mathrm{h}}_{0}^{\mathrm{i}}\), \({\upmu }_{\mathrm{max}}^{\mathrm{i}}\), \({\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}\), and \({\mathrm{y}}_{0}^{\mathrm{i}}\)) of each \(\mathrm{i}\) microbial species were estimated for each of the twelve experimental datasets of each pure culture. The fitting procedure was performed using the fit function of the Curve Fitting Tool of the Matlab R2020b software (Mathworks, Natick, USA), with the non-linear least squares method and the trust-region reflective Newton algorithm. The goodness-of-fit statistical indexes coefficient of determination (\({\mathrm{R}}^{2}\)) and root-mean-squared errors (\(\mathrm{RMSE}\)) were provided by the software. Then, the average of all \({\mathrm{h}}_{0}^{\mathrm{i}}\) values of each \(i\) species was calculated (\({\mathrm{h}}_{0,\mathrm{avg}}^{\mathrm{i}}\)) and fixed in the model. Finally, the three primary model parameters (\({\upmu }_{\mathrm{max}}^{\mathrm{i}}\), \({\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}\), and \({\mathrm{y}}_{0}^{\mathrm{i}}\)) were estimated again for each pure culture.

The \({\upmu }_{\mathrm{max}}^{\mathrm{i}}\) parameter of the primary model is dependent on extrinsic and intrinsic factors, as temperature (\(\mathrm{T}\)) and pH (\(\mathrm{pH}\)) in this study. Such dependences were modeled by the cardinal temperature and pH secondary models proposed by Rosso et al. [31], as given by Eqs. (3, 45). The \({\upgamma }_{\mathrm{T}}^{\mathrm{i}}\) and \({\upgamma }_{\mathrm{pH}}^{\mathrm{i}}\) functions describe the influence of the factors on \({\upmu }_{\mathrm{max}}^{\mathrm{i}}\) of \(\mathrm{i}\) species, and they are composed by biological-meaning parameters, in which \({\upmu }_{\mathrm{opt}}^{\mathrm{i}}\) (h−1) is the maximum specific growth rate of each \(i\) species at the optimal condition; \({\mathrm{T}}_{\mathrm{min}}^{\mathrm{i}}\), \({\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}\), and \({\mathrm{T}}_{\mathrm{max}}^{\mathrm{i}}\) are the minimum, optimum, and maximum temperatures for the growth of each \(i\) species, respectively; and \({\mathrm{pH}}_{\mathrm{min}}^{\mathrm{i}}\), \({\mathrm{pH}}_{\mathrm{opt}}^{\mathrm{i}}\), and \({\mathrm{pH}}_{\mathrm{max}}^{\mathrm{i}}\) are the minimum, optimum, and maximum pH’s for the growth of each \(i\) species, respectively. As the fermentation processes were performed at temperatures and pH’s below the maximum (\(\mathrm{T}<{\mathrm{T}}_{\mathrm{max}}\) and \(\mathrm{pH}<{\mathrm{pH}}_{\mathrm{max}}\)) and above the minimum (\(\mathrm{T}>{\mathrm{T}}_{\mathrm{min}}\) and \(\mathrm{pH}>{\mathrm{pH}}_{\mathrm{min}}\)), \({\upmu }_{\mathrm{max}}^{\mathrm{i}}\) values are always higher than zero; then, the if conditionals of Eqs. (4) and (5) related to these growth limits were not needed in the model.

$${\upmu }_{\mathrm{max}}^{\mathrm{i}}={\upmu }_{\mathrm{opt}}^{\mathrm{i}}{\upgamma }_{\mathrm{T}}^{\mathrm{i}}{\upgamma }_{\mathrm{pH}}^{\mathrm{i}}$$
(3)
$${\upgamma }_{\mathrm{T}}^{\mathrm{i}}=\frac{\left(\mathrm{T}- {\mathrm{T}}_{\mathrm{max}}^{\mathrm{i}}\right){(\mathrm{T}- {\mathrm{T}}_{\mathrm{min}}^{\mathrm{i}})}^{2}}{\left({\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}-{\mathrm{T}}_{\mathrm{min}}^{\mathrm{i}}\right)\left[\left({\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}-{\mathrm{T}}_{\mathrm{min}}^{\mathrm{i}}\right)\left(\mathrm{T}-{\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}\right)-\left({\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}-{\mathrm{T}}_{\mathrm{max}}^{\mathrm{i}}\right)\left({\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}+{\mathrm{T}}_{\mathrm{min}}^{\mathrm{i}}-2\mathrm{T}\right)\right]}$$
(4)
$${\upgamma }_{\mathrm{pH}}^{\mathrm{i}}=\frac{\left(\mathrm{pH}- {\mathrm{pH}}_{\mathrm{min}}^{\mathrm{i}}\right)(\mathrm{pH}- {\mathrm{pH}}_{\mathrm{max}}^{\mathrm{i}})}{\left(\mathrm{pH}-{\mathrm{pH}}_{\mathrm{min}}^{\mathrm{i}}\right)\left(\mathrm{pH}-{\mathrm{pH}}_{\mathrm{max}}^{\mathrm{i}}\right)- {\left(\mathrm{pH}-{\mathrm{pH}}_{\mathrm{opt}}^{\mathrm{i}}\right)}^{2}}$$
(5)

The seven secondary model parameters (\({\upmu }_{\mathrm{opt}}^{\mathrm{i}}\), \({\mathrm{T}}_{\mathrm{min}}^{\mathrm{i}}\), \({\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}\), \({\mathrm{T}}_{\mathrm{max}}^{\mathrm{i}}\), \({\mathrm{pH}}_{\mathrm{min}}^{\mathrm{i}}\), \({\mathrm{pH}}_{\mathrm{opt}}^{\mathrm{i}}\), and \({\mathrm{pH}}_{\mathrm{max}}^{\mathrm{i}}\)) of each \(i\) microbial species were estimated for each of the twelve \({\upmu }_{\mathrm{max}}^{\mathrm{i}}\) values of each pure culture. The fitting procedure was also performed using the fit function of the Curve Fitting Tool of the Matlab R2020b software. The average of all \({\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}\) values of each \(\mathrm{i}\) species was calculated (\({\mathrm{y}}_{\mathrm{max},\mathrm{avg}}^{\mathrm{i}}\)) and considered in the internal model validation.

Fitting the primary and secondary models with the one-step modeling approach

The growth of each \(i\) pure culture was described by the differential form of the Baranyi and Roberts [6] primary model, as shown in Eqs. (6) and (7). The initial conditions for the Eqs. (6) and (7) were \({\mathrm{y}}^{\mathrm{i}}\left(0\right)={\mathrm{y}}_{0}^{\mathrm{i}}\) and \({\mathrm{Q}}^{\mathrm{i}}\left(0\right)={\mathrm{Q}}_{0}^{\mathrm{i}}\), respectively, in which \({\mathrm{Q}}_{0}^{\mathrm{i}}\) is a parameter related to the physiological state of cells (\({\mathrm{h}}_{0}^{\mathrm{i}}\)) of each \(i\) species at the time zero, as well as can be related to the adaptation time (\({\uplambda }^{\mathrm{i}}\)) and \({\mu }_{\mathrm{max}}^{i}\) of each microbial species, as given by Eq. (8). The \({\mathrm{Q}}_{0}^{\mathrm{i}}\) values of each \(i\) species were the same of the two-step modeling approach.

$$\frac{{\mathrm{dy}}^{\mathrm{i}}}{\mathrm{dt}}={\upmu }_{\mathrm{max}}^{\mathrm{i}}\left(\frac{1}{1+{\mathrm{e}}^{-{\mathrm{Q}}^{\mathrm{i}}}}\right)\left(1-{\mathrm{e}}^{\left({\mathrm{y}}^{\mathrm{i}}-{\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}\right)}\right)$$
(6)
$$\frac{{\mathrm{dQ}}^{\mathrm{i}}}{\mathrm{dt}}={\upmu }_{\mathrm{max}}^{\mathrm{i}}$$
(7)
$${\mathrm{h}}_{0}^{\mathrm{i}}=\mathrm{ln}\left(1+1/{\mathrm{Q}}_{0}^{\mathrm{i}}\right)={\upmu }_{\mathrm{max}}^{\mathrm{i}}{\uplambda }^{\mathrm{i}}$$
(8)

The dependence of the \({\upmu }_{\mathrm{max}}^{\mathrm{i}}\) parameter of the primary model on the temperature (\(\mathrm{T}\)) and pH were also modeled by the cardinal temperature and pH secondary models proposed by Rosso et al. [31], as given by Eqs. (3) to (5). The seven secondary model parameters (\({\upmu }_{\mathrm{opt}}^{\mathrm{i}}\), \({\mathrm{T}}_{\mathrm{min}}^{\mathrm{i}}\), \({\mathrm{T}}_{\mathrm{opt}}^{\mathrm{i}}\), \({\mathrm{T}}_{\mathrm{max}}^{\mathrm{i}}\), \({\mathrm{pH}}_{\mathrm{min}}^{\mathrm{i}}\), \({\mathrm{pH}}_{\mathrm{opt}}^{\mathrm{i}}\), and \({\mathrm{pH}}_{\mathrm{max}}^{\mathrm{i}}\)) and one primary model parameter (\({\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}\)) of each \(i\) microbial species were estimated directly from all the twelve sets of experimental data of each pure culture. The fitting procedure to estimate the model parameters (and the estimation of the pairwise correlation values) from the experimental data was performed with the AMIGO_PE task of the AMIGO2 R2019b toolbox [4] in the Matlab R2020b software, with the lsq cost function and Q_I (no weighting) cost type, ode45 solver, and sensmat to compute the sensitivities.

Identifiability analysis with a Monte Carlo–based approach

Identifiability analysis was performed with the AMIGO_Rident task of the AMIGO2 toolbox in the Matlab software. The computer used to perform the simulations was a Lenovo Thinkpad model T470, equipped with Intel® Core™ i7-7600U 2.90 GHz, 8.00 GB RAM, and HD SSD 256 GB. The initial estimate of each model parameter was established from the values obtained in the one-step modeling approach, in which minimum and maximum allowed values of the parameters were established as − 30% and + 30% of the initial estimates, respectively. The Monte Carlo analysis was performed with 500 runs. The experimental noise was assumed as homoscedastic, with 10% of standard deviation in relation to the experimental data values. The results were expressed as frequency distributions of the model parameter estimates in intervals at every 0.1 units.

Internal model validation

The estimated parameters from two-step and one-step modeling approaches were replaced in the primary and secondary models and, then, the model predictions (\({\mathrm{pd}}_{\mathrm{i}}\)) were compared with the observed data (\({\mathrm{ob}}_{\mathrm{i}}\)) of the twelve sets of each pure culture with the purpose of internal model validation. The \(\mathrm{RMSE}\) (Eq. (9)), percent bias factor (\({\mathrm{\%B}}_{\mathrm{f}}\), Eq. (12)), percent discrepancy factor (\({\mathrm{\%D}}_{\mathrm{f}}\), Eq. (14)), and mean absolute error (\(\mathrm{MAE}\), Eq. (15)) were used to assess the ability of the model to describe the experimental data [5]. \({\mathrm{\%B}}_{\mathrm{f}}\) values higher than 0 indicate that the model overpredicts the data, while \({\mathrm{\%B}}_{\mathrm{f}}\) values lower than 0 indicate that the model underpredicts the data. The \(\mathrm{RMSE}\) and \({\mathrm{\%D}}_{\mathrm{f}}\) are equal or higher than 0, in which higher values suggest higher discrepancies of the model to the experimental data.

$$\mathrm{RMSE}=\sqrt{\frac{\sum_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{pd}}_{\mathrm{i}}-{\mathrm{ob}}_{\mathrm{i}}\right)}^{2}}{\mathrm{n}}}$$
(9)
$${\mathrm{B}}_{\mathrm{f}}=\mathrm{exp}\left(\frac{\sum_{\mathrm{i}=1}^{\mathrm{n}}\left(\mathrm{ln }{\mathrm{pd}}_{\mathrm{i}}-{\mathrm{lnob}}_{\mathrm{i}}\right)}{\mathrm{n}}\right)$$
(10)
$$\mathrm{sgn}\left({\mathrm{lnB}}_{\mathrm{f}}\right)=\left\{\begin{array}{c}+1, if{\mathrm{lnB}}_{\mathrm{f}}>0\\ 0, if{\mathrm{lnB}}_{\mathrm{f}}=0\\ -1, if{\mathrm{lnB}}_{\mathrm{f}}<0\end{array}\right\}$$
(11)
$${\mathrm{\%B}}_{\mathrm{f}}=\mathrm{sgn}\left({\mathrm{lnB}}_{\mathrm{f}}\right)\left(\mathrm{exp}\left|{\mathrm{lnB}}_{\mathrm{f}}\right|-1\right)100\mathrm{\%}$$
(12)
$${\mathrm{A}}_{\mathrm{f}}=\mathrm{exp}\left(\sqrt{\frac{\sum_{\mathrm{i}=1}^{\mathrm{n}}{\left(\mathrm{ln }{\mathrm{pd}}_{\mathrm{i}}-{\mathrm{lnob}}_{\mathrm{i}}\right)}^{2}}{\mathrm{n}}}\right)$$
(13)
$${\mathrm{\%D}}_{\mathrm{f}}=\left({\mathrm{A}}_{\mathrm{f}}-1\right)100\mathrm{\%}$$
(14)
$$\mathrm{MAE}=\frac{1}{\mathrm{n}}\sum_{\mathrm{i}=1}^{\mathrm{n}}\left|{\mathrm{pd}}_{\mathrm{i}}-{\mathrm{ob}}_{\mathrm{i}}\right|$$
(15)

Results and discussion

Model parameters estimated from two-step and one-step modeling approaches

The parameters estimated in the first step of the two-step modeling approach from the fitting of the Baranyi and Roberts primary model (Eqs. (1) and (2)) to the experimental data of L. bulgaricus and S. thermophilus are shown in Table 1. The model was able to describe the experimental data, with \({R}^{2}\) ≥ 0.970 and \(\mathrm{RMSE}\) ≤ 0.112 log CFU/mL. The Baranyi and Roberts model has been extensively used, with success, in predictive microbiology studies to describe the bacterial growth kinetics in foods [24, 25]. Silva et al. [32] chose the Baranyi and Roberts model to define the growth parameters of three LAB (Lactobacillus plantarum, Weissella viridescens, and Lactobacillus sakei) because the values of statistical indices of this model were slightly better than the values of the modified Gompertz model.

Table 1 Model parameters (± 95% confidence intervals) and statistical indexes of the fitting of the primary model, Eqs. (1) and (2), to S. thermophilus and L. bulgaricus growth data from the experimental data of twelve datasets of each pure culture

The results showed in Table 1 allow one to observe that the initial (\({y}_{0}\)) and maximum (\({y}_{\mathrm{max}}\)) concentrations of each specie in the experiments were close to each other. The closer \({y}_{0}\) values in the experiments with inoculated medium is a good indicator that the inoculation procedure was well conducted and the closer \({y}_{\mathrm{max}}\) values is a good indicator that the maximum concentration of each microbial species tends to an average value, in which such average values can be fixed in the model. The averages of all the \({h}_{0}\) values (1.91, 1.25, 1.90, 0.31, 1.26, 0.42, 0.54, 1.04, 2.10, 2.20, 1.81, and 1.47 for S. thermophilus, and 2.23, 2.08, 2.63, 1.02, 1.94, 0.00, 0.17, 0.85, 3.23, 3.83, 2.98, and 2.27 for L. bulgaricus) of each species (\({h}_{0,av}\)) were higher than 0, suggesting that both bacteria showed adaptation phases, corresponding to a higher adaptation for L. bulgaricus than S. thermophilus. The \({h}_{0,av}\) estimates presented satisfactory 95% confidence intervals, since the adaptation time in the bacterial growth (related to the physiological state of cells, as shown in Eq. (8)) is the most uncertain parameter [7].

The \({\mu}_{\max}\) model parameter showed narrower 95% confidence intervals and high dependences on \(T\) and \(\mathrm{pH}\). One can see a priori that, in general, lower and higher values of \(T\) in the experimental range lead to lower values of \({\mu }_{\mathrm{max}}\) for both bacteria, while intermediate values of \(T\) lead to higher values of \({\mu }_{\mathrm{max}}\). On the other hand, in general, lower values of \(\mathrm{pH}\) in the experimental range lead to lower and higher values of \({\mu }_{\mathrm{max}}\) for S. thermophilus and L. bulgaricus, respectively, while higher values of \(pH\) lead to the opposite (higher and lower values of \({\mu }_{\mathrm{max}}\) for S. thermophilus and L. bulgaricus, respectively). These dependences were quantitatively described by the secondary model (by \({\upgamma }_{\mathrm{T}}\) and \({\upgamma }_{\mathrm{pH}}\)), as follows, in which the observation of the trends indicates the appropriate form of the model.

In the typical production of stirred yoghurt, with incubation at 42–43 °C, about 0.02% inoculum of highly concentrated culture is used, in which most yoghurt has a ratio of cocci to bacilli between 1:1 and 2:1 (Bylund, 2015) [11]. This inoculum level of LAB corresponds to the decimal level applied in the experiments of the current study (0.02% of 109 CFU/mL = 2 × 107 CFU/mL). In this context of high initial concentration of the inoculum, one important trend that has been experimentally observed is that high initial cell concentrations tend to decrease the bacterial maximum specific growth rates [19]. Mathematically, high initial concentration also affects the estimation of the maximum specific growth rate because the term of the stationary phase (\(1-{\mathrm{e}}^{\left({\mathrm{y}}^{\mathrm{i}}-{\mathrm{y}}_{\mathrm{max}}^{\mathrm{i}}\right)}\), Eq. (6)) of the Baranyi and Roberts model is directly affected by the concentration. For example, for a maximum concentration of 109 CFU/mL and initial concentrations of 105, 106, and 107 CFU/mL, the resultant terms of the stationary phase are 0.982, 0.950, and 0.864, respectively; in a scenario with a low initial concentration, the term tends to one. Therefore, the value of the maximum specific growth rate can be underestimated when the initial concentration is high. The fitting of the primary and secondary models, Eqs. (3) to (8), with the one-step modeling approach resulted in lower values of \(\mathrm{RMSE}\) and higher values of \({R}^{2}\) than the two-step modeling approach for both species, L. bulgaricus and S. thermophilus. Furthermore, Akkermans et al. [2] stated that the two-step modeling approach results in less precise and less accurate calculations of the 95% confidence bounds than the one-step method when applying the commonly used linear approximation, corroborating the results of this study. Information on the variability of the model parameters is lost by making the intermediate step in the two-step method [2].

The models were able to describe the experimental data, as indicated by low \(\mathrm{RMSE}\) and high \({R}^{2}\) values, as shown in Table 1. The eight model parameters estimated for each species in the fits are shown in the same table. The estimated parameters indicate that minimum, optimal, and maximum values for each factor (\(\mathrm{T}\) and \(\mathrm{pH}\)) are different for each species. The estimated optimum temperature for L. bulgaricus (\({\mathrm{T}}_{\mathrm{opt}}=43.5\pm 1.0\)) was slightly higher than that of S. thermophilus (\({\mathrm{T}}_{\mathrm{opt}}=41.0\pm 2.9\)), although they are statistically equivalent. On the other hand, the estimated optimum pH for S. thermophilus (\({\mathrm{pH}}_{\mathrm{opt}}=7.44\pm 0.71\)) was statistically higher than that of L. bulgaricus (\({\mathrm{pH}}_{\mathrm{opt}}=4.91\pm 0.24\)). Both estimated trends corroborate the literature [1, 9, 30], although the estimated values for the parameters may differ from the literature.

The maximum specific growth rates estimated for both bacteria (logarithmic base 10) at optimal conditions (\({\mu }_{\mathrm{opt}}\)) were close (~ 1.24 h−1). These estimates indicate that S. thermophilus incubated at \(T\) = 41.0 °C and \(\mathrm{pH}\) = 7.44 or L. bulgaricus at \(\mathrm{T}\) = 43.5 °C and \(\mathrm{pH}\) = 4.91 may have doubling times as low as ~ 15 min. Beal and Corrieu [8] estimated maximum specific growth rates (logarithm natural base) as high as 3.84 h−1 for S. thermophilus 404 at \(\mathrm{pH}\) = 6.70 and \(T\) = 37.5 °C, and 3.96 h−1 for L. bulgaricus 398 at \(\mathrm{pH}\) = 5.20 and \(T\) = 42.0 °C, leading to doubling times as low as ~ 11 min. Aghababaie et al. [1] estimated maximum specific growth rates (from biomass data, in g/L) of 1.18 h−1 for S. thermophilus at \(\mathrm{pH}\) = 6.87 and \(\mathrm{T}\) = 42.8 °C, and 1.95 h−1 for L. bulgaricus at \(\mathrm{pH}\) = 5.25 and \(T\) = 44.0 °C. However, these estimates cannot be directly compared to the results of the present study because they were obtained from different basis (log CFU/mL and g/L). The maximum specific growth rate is affected by many factors related to the organism (e.g., serotype, physiological state of the cells) and the medium (e.g., composition, environment).

The parameters related to the minimum (\({\mathrm{T}}_{\mathrm{min}}\) and \({\mathrm{pH}}_{\mathrm{min}}\)) and maximum (\({\mathrm{T}}_{\mathrm{max}}\) and \({\mathrm{pH}}_{\mathrm{max}}\)) values of the factors were estimated with reasonable extrapolation considering the temperatures (34.3 o 49.6 °C) and pH (4.56 to 7.63) ranges applied experimentally, as shown in Table 2. Then, the estimated values of these parameters did not have high accuracy, as can be seen by their wider 95% confidence intervals (Table 2). Concerning the experiment design based on the sensitivity functions, Bernaerts et al. (2005) [10] suggest selecting temperature/pH levels at/near the maxima of the model output sensitivities because data at these positions have the largest influence on the parameter values, and experimental errors at these points shall have major (adverse) effects on the parameter values during parameter estimation.

Table 2 Secondary growth parameters (± 95% confidence intervals), Eqs. (3) to (5), of S. thermophilus and L. bulgaricus estimated with the two-step and one-step modeling approaches from the experimental data of twelve datasets of each pure culture

The pairwise correlation values (results in Table 3) for temperature parameters show that \({\mathrm{T}}_{\mathrm{min}}\), \({\mathrm{T}}_{\mathrm{opt}}\), and \({\mathrm{T}}_{\mathrm{max}}\) estimates were highly correlated (in bold), mainly \({\mathrm{T}}_{\mathrm{min}}\) and \({\mathrm{T}}_{\mathrm{max}}\). Le Marc et al. [22] stated that strong linear correlations were highlighted between the cardinal temperature parameters that were valid across a range of different bacterial species. On the other hand, for pH parameters, \({\mathrm{pH}}_{\mathrm{min}}\), \({\mathrm{pH}}_{\mathrm{opt}}\), and \({\mathrm{pH}}_{\mathrm{max}}\) estimates were less correlated. The \({\upmu }_{\mathrm{opt}}\) and \({\mathrm{y}}_{\mathrm{max}}\) parameters showed a low correlation to the other parameters.

Table 3 Correlation values of the model parameters of S. thermophilus and L. bulgaricus estimated from the one-step modeling approach. Estimates highly correlated are shown in bold

The \({\upgamma }_{\mathrm{T}}^{\mathrm{i}}\) and \({\upgamma }_{\mathrm{pH}}^{\mathrm{i}}\) functions aim at reducing the specific growth rate values as a function of the factor levels (\(\mathrm{T}\) and \(\mathrm{pH}\)) from the multiplication in the root equations, such as Eq. (3). The response of these functions should be positive and lower or equal to one (to the optimal condition). The functions \({\upgamma }_{\mathrm{T}}^{\mathrm{i}}\) and \({\upgamma }_{\mathrm{pH}}^{\mathrm{i}}\) given by Eqs. (4) and (5), respectively, show the expected behavior, as can be seen in Fig. 1. The analysis of the temperature function \({\upgamma }_{\mathrm{T}}^{\mathrm{i}}\) shows that L. bulgaricus is more sensitive to temperature variations than S. thermophilus. For instance, from Eq. (4), a change of ~ 3.6 °C in the incubation temperature of L. bulgaricus can lead to a ~ 10% drop in its specific growth rate with respect to the optimal one, whereas a change of ~ 4.7 °C would be required for S. thermophilus. A similar analysis of the pH function \({\upgamma }_{\mathrm{pH}}^{\mathrm{i}}\), Eq. (5), suggests that both bacteria, L. bulgaricus and S. thermophilus, have similar sensitivities to pH variations. For instance, a change of ~ 0.88 pH units in the incubation can lead to a ~ 10% drop in the maximum specific growth rate of each bacterium in relation to the optimal one.

Fig. 1
figure 1

Curves of the \(\gamma\) functions (continuous lines), Eqs. (4) and (5), for a L. bulgaricus to the temperature (\({\gamma }_{T}^{L}\)), b S. thermophilus to the temperature (\({\gamma }_{T}^{S}\)), c L. bulgaricus to the pH (\({\gamma }_{pH}^{L}\)), and d S. thermophilus (\({\gamma }_{pH}^{S}\)) to the pH. The circles (symbols) represent the levels of each factor in which experiments were performed

Identifiability analysis

The frequency distributions of the model parameters estimated from 500 runs of the Monte-Carlo identifiability analysis for the growth of S. thermophilus and L. bulgaricus are presented in Figs. 2 and 3, respectively. The time to compute the 500 runs in Matlab with the AMIGO2 toolbox was about 36 h for each microorganism. Some model parameters, especially the optimal ones (\({\upmu }_{\mathrm{opt}}\), \({\mathrm{T}}_{\mathrm{opt}}\), and \({\mathrm{pH}}_{\mathrm{opt}}\)), showed frequency distributions which can be interpreted as normal distributions, with one peak with higher frequency around the mean value and narrow 95% confidence intervals. Poschet et al. [29] stated that, for a high number of data points, all the Baranyi and Roberts model parameter distributions tend to normality. On the other hand, the parameters related to minimum and maximum values of the \(\mathrm{T}\) and \(\mathrm{pH}\) factors (\({\mathrm{T}}_{\mathrm{min}}\), \({\mathrm{T}}_{\mathrm{max}}\), \({\mathrm{pH}}_{\mathrm{min}}\), and \({\mathrm{pH}}_{\mathrm{max}}\)) showed irregular frequency distributions, with high frequencies in the boards of the parameter intervals (± 30% of the parameter values). The higher uncertainties in the minimum and maximum parameters than the optimum ones can be justified by the experimental design. The experiments were performed in levels close to the optimal conditions and far from minimum and maximum ones. In other words, there are experiments performed close to the optimal levels of each factor, but there are not experiments close to maximum and minimum levels. Therefore, the accuracy of the model parameters would be improved with experiments performed close to maximum and minimum levels of each factor.

Fig. 2
figure 2

Frequency distributions (vertical bars) of the S. thermophilus growth parameters estimated from 500 runs in the Monte-Carlo simulations. Vertical lines: initial and average values of the model parameters. Horizontal lines: 95% confidence intervals of model parameters. Underlined number: estimated value from the experimental data. Italicized number (± 95% confidence interval): estimated value from the Monte-Carlo analysis

Fig. 3
figure 3

Frequency distributions of the L. bulgaricus growth parameters estimated from 500 runs in the Monte-Carlo simulations. Vertical lines: initial and average values of the model parameters. Horizontal lines: 95% confidence intervals of model parameters

The values of the estimated parameters from the fitting with one-step model approach and the Monte-Carlo analysis were similar, corroborating with Akkermans et al. [2], with stated that average values of the parameter estimates approximated the nominal (given) values. Every parameter estimated in the one-step modeling approach was inside the 95% confidence interval of the model parameters estimated in the Monte-Carlo analysis, as can be seen in Figs. 2 and 3. These results are desirable and reinforces that one-step modeling approach can provide reliable estimation for the model parameters.

Internal model validation

The values of the statistical indexes \(\mathrm{RMSE}\), \({\mathrm{\%B}}_{\mathrm{f}}\), \({\mathrm{\%D}}_{\mathrm{f}}\), and \(\mathrm{MAE}\) calculated in the internal validation of the primary and secondary model parameters of S. thermophilus and L. bulgaricus estimated from the two-step and the one-step modeling approaches are shown in Table 4. The \(\mathrm{RMSE}\) values calculated from the one-step were, in general, lower (averages of 0.125 and 0.090 log CFU/mL for S. thermophilus and L. bulgaricus, respectively) than the values from the two-step (averages of 0.306 and 0.104 log CFU/mL for S. thermophilus and L. bulgaricus, respectively). The same behavior was observed for \({\%D}_{f}\) (averages of 1.5% and 1.2% for S. thermophilus and L. bulgaricus, respectively) and \(\mathrm{MAE}\) values (averages of 0.100 and 0.070 log CFU/mL for S. thermophilus and L. bulgaricus, respectively). Therefore, these results indicate that the parameters estimated from one-step approach resulted in lower mean residuals and discrepancy of the model to the experimental observations than that estimated from the two-step approach. Pin et al. [27] stated that values between 25 and 50% for the discrepancy between model predictions and observations have been reported as acceptable when validating other models.

Table 4 Statistical indexes (root-mean-squared errors, \(RMSE\); percent bias, \({\%B}_{f}\); percent discrepancy, \({\%D}_{f}\); and mean absolute errors, \(MAE\)) calculated in the internal validation of the primary and secondary model parameters of S. thermophilus and L. bulgaricus estimated from two-step and one-step modeling approaches

The \({\mathrm{\%B}}_{\mathrm{f}}\) values calculated from the one-step modeling approach were, in general, closer to zero (averages of 0.0% and − 0.1% for S. thermophilus and L. bulgaricus, respectively) than the values from the two-step modeling approach (averages of 2.0% and − 0.2% for S. thermophilus and L. bulgaricus, respectively). Dalgaard (2000) [14] considers that the performance of a model developed for spoilage bacteria is acceptable when the bias factor is lower than 1.25, which correspond to a \({\mathrm{\%B}}_{\mathrm{f}}\) lower than 25%. Therefore, these results indicate that the parameters estimated from one-step approach resulted in lower bias of the model to underestimate or overestimate the experimental observations than that estimated from the two-step approach.

The experimental data and the mathematical model curves obtained in the calculation of the internal model validation with the parameters estimated from the one-step modeling approach are shown in supplementary file 2. The most pronounced overprediction (dataset 3, \({\mathrm{\%B}}_{\mathrm{f}}=2.4\mathrm{\%}\)) and underprediction (dataset 7, \({\mathrm{\%B}}_{\mathrm{f}}=-1.6\mathrm{\%}\)) of the growth of S. thermophilus occurred at lower pH values (5.7 and 5.36). The biased predictions of L. bulgaricus were less pronounced, in which the highest overprediction (dataset 2, \({\mathrm{\%B}}_{\mathrm{f}}=1.2\mathrm{\%}\)) occurred at a high pH value (6.5). Therefore, the biased predictions occurred more as a function of the pH levels farther from the optimum for each species than as a function of the temperature.

Conclusion

The Baranyi and Roberts primary model and Rosso and coworkers’ secondary model were able to describe the growth data of pure cultures of S. thermophilus and L. bulgaricus. The model fitting with the one-step modeling approach showed better statistical results (higher \({\mathrm{R}}^{2}\), lower \(\mathrm{RMSE}\), and narrower 95% confidence intervals of model parameters) than the two-step approach. The values of the eight growth parameters (\({\upmu }_{\mathrm{opt}}\), \({\mathrm{T}}_{\mathrm{min}}\), \({\mathrm{T}}_{\mathrm{opt}}\), \({\mathrm{T}}_{\mathrm{max}}\), \({\mathrm{pH}}_{\mathrm{min}}\), \({\mathrm{pH}}_{\mathrm{opt}}\), \({\mathrm{pH}}_{\mathrm{max}}\), and \({\mathrm{y}}_{\mathrm{max}}\)) for each culture estimated from the fitting with the one-step model approach and the Monte-Carlo analysis were similar, as desirable reinforcing that the one-step modeling approach can provide reliable estimation for the model parameters. In the internal model validation, the averaged \(\mathrm{RMSE}\) (0.125 and 0.090 log CFU/mL), \({\mathrm{\%D}}_{\mathrm{f}}\) (\(1.5\%\) and \(1.2\%\)), and \(\mathrm{MAE}\) (0.100 and 0.070 log CFU/mL) values for S. thermophilus and L. bulgaricus, respectively, indicate that the parameters resulted in low mean residuals and discrepancy of the model to the experimental observations. \({\mathrm{\%B}}_{\mathrm{f}}\) values close to 0 (averages of 0.0% and − 0.1% for S. thermophilus and L. bulgaricus, respectively) indicate that the parameters estimated resulted in a low bias of the model to underestimate or overestimate the experimental observations.

The results of the current study can help researchers to improve their knowledge of the effect of temperature and pH on the growth of starter cultures of milk products, including the importance of choosing adequate levels of each factor in the experimental design, the best modeling approach for the parameter estimation, and the need to validate the model. Furthermore, the values of the growth parameters estimated for each species (S. thermophilus and L. bulgaricus) and their respective ± 95% confidence intervals can be helpful to further develop the fermentation process.