Key words

1 Introduction

In recent decades, food quality and safety concerns have been growing. There are physical, chemical, and biological risks, which can affect food integrity and safety [1]. Foods are subject to food contamination by microorganisms, pathogenic or spoilage bacteria, which can cause food batch recall and foodborne diseases and impact the consumer’s safety [1]. Due to the expansion of the food trade, it is possible to notice a relative difficulty in managing risks and ensuring the protection of consumers’ health [2].

The unit operations in the food chain can affect microorganism viability leading to inactivation. Thereby, it becomes crucial to know the microbial behavior parameters to understand microbial growth dynamics [3] and the responses of microorganisms to specific environmental conditions [3, 4]. Data collection in different environmental conditions enables assessing microbial kinetics and can predict responses in other similar environments through mathematical models [5]. Predictive microbiology uses mathematical models to quantify the effect of intrinsic and extrinsic factors (i.e., temperature, pH, water activity, autochthonous microbiota, and natural antimicrobial compounds) on microbial behavior. Predictive models can predict growth, inactivation parameters, and toxin production [6, 7]. The responses provided by the models can support food processors and regulatory agencies in data-driven decision-making.

The first description using a model was done by Bigelow and Esty [8], Bigelow [9], and Esty and Meyer [10]. Although the concept of “predictive microbiology” was first proposed in 1937, it was not thoroughly applied until the early 1980s, when the response to large food poisoning outbreaks spurred efforts to apply mathematical models. These efforts were used in pathogen inactivation (for instance, for Clostridium botulinum and Staphylococcus aureus) and measuring the spoilage bacteria growth [11]. In the 1960s and 1970s, studies applied mathematical models aimed toward the inactivation of bacteria and fungi. The predictive microbiology field was raised in the 1980s and 1990s. This intensification was attributed to accessibility to computational tools and software, which allowed the use of more complex and more accurate models [12].

Predictive microbiology aims to mathematically represent a microbiological process’s reality and quantify its intrinsic and extrinsic effects [13]. Predictive microbiology has emerged at the interface of different areas of knowledge, including microbiology, statistics, mathematics, and computation. It has become an essential tool for data-driven decision-making [14]. Due to the importance of individual factors for each food type, data collection for predictive microbiological models is mainly based on laboratory data. However, there is also an increasing amount of research focused on purposely contaminated food to generate data for predictive microbiology [6].

The US Department of Agriculture-Agricultural Research Service (USDA-ARC) and the UK Ministry of Agriculture, Fisheries and Food proposed the first predictive microbiology software’s approach in the 1990s [15]. The systems were called Pathogen Modeling Program and Food Micromodel, respectively. Both tools have a database and mathematical models to describe growth responses to environmental factors of foodborne pathogens [4]. Afterward, other tools emerged for predictive microbiology, such as Seafood Spoilage Predictor (currently called Food Safety Spoilage Predictor) [16], Dmfit, Ginafit [17], Microbial Responses Viewer [18], MicroHibro [19], Combase [20], and Bioinactivation [21].

The guarantee of food safety and quality can be impacted by the emergence of innovative food products such as new preserving food technologies. Thus, investigations that evaluate possible complications that may affect the quality of products are necessary. Also, current knowledge that are already available, the effects of processing new products, and other factors are of great importance to consider when developing predictive models [14].

Additionally in the food industry, authorities are seeking solutions and tools to mitigate or solve problems related to food safety. Considering the challenges to maintain the food quality and safety throughout the food chain, it is feasible to apply predictive modeling in the food industry [11]. Therefore, it is essential to plan the entire process and collect microbiological data that adequately reproduce the behavior of the microorganism in that specific study environment as well as choosing the appropriate model which relies the studies [22].

This book will cover the particularities of food matrices, the relationship between foods and microorganisms, and data collection for predictive models. Therefore, understanding microbial behavior and the influence of environmental conditions is fundamental for properly developing studies based on predictive food modeling.

2 Application of Predictive Models

The application of predictive models serves as a science-based tool that can aid shelf life studies and help design or reformulate food products based on a safety and quality perspective, meeting ongoing needs for quality and food safety [23]. The predictive microbiology applications are shown in Fig. 1.

Fig. 1
A chart exhibits the uses of predictive microbiology. It is used in experimental draw, microbiological risk assessment and management, hazard analysis and critical control points, education, innovation and development of new products, shelf life, and hygiene measures and temperature integration.

Application of predictive microbiology

Predictive models provide a quantitative view of the effect of intrinsic (see Note 1) and extrinsic factors (see Note 2) on the quality and safety of food [14]. For instance, in quantitative microbial risk assessment (QMRA), predictive models can estimate the impact of unit operation along the food chain on microbial behavior and quantify the risks associated with contaminated food consumption [24]. Furthermore, the predictions by the models can support decision-making processes, highlighting its integration into self-control systems, such as hazard analysis and critical control points (HACCP), to determine process criteria and control limits [25]. Predictive microbiology can also optimize and validate new thermal and nonthermal food processing technologies due to its ability to assess the process impact against microbial inactivation and describe the response of microorganisms if there is any environmental change [26]. Therefore, predictive microbiology is a valuable tool for food quality and safety decision-making.

3 Concepts for Predictive Microbiology and Mathematical Modeling

An appropriate understanding of mathematical and microbiological concepts is essential for developing sound foundations for predictive microbiology. This understanding is necessary for creating realistic models and exploring various mathematical tools and methods in this field. For instance, to determine whether a proposed inactivation technology can be used, it is crucial to have an essential understanding of the characteristics of microorganisms and other relevant concepts in microbiology.

3.1 Microbiological Concepts

The logarithmic scale is commonly used to represent microbial growth and express populations graphically. It allows for better visualization and understanding of growth stages on a curve. Within an analytical scenario, various techniques serve to enumerate microorganism purposes and can be divided into two major groups: quantitative and qualitative analysis methods. Quantitative analysis involves numerically measuring a compound, group of compounds, or parameters (see Note 3), such as calculating colony forming units (CFU) per gram or milliliter, most probable number (MPN) [27] (see Chapter 3), or number of cells by microscopy [28] (see Chapter 7). In contrast, the qualitative analysis does not measure the specific amount of a compound or the value of an analysis parameter. Instead, it aims to indicate the presence or absence of the analyzed parameter or boundaries for microbial growth (see Chapter 6).

Microbial growth involves increasing the number of individuals (cells) through processes such as binary fission, budding, spore formation, or fragmentation. This growth is characterized by a “generation time,” which is required for a cell to divide (see Chapter 7) (assuming binary fission as the most common process) or for a population to duplicate itself. The generation time may vary depending on the microorganism or medium temperature [29]. Understanding the relationship between generation time and microbial growth phases is crucial, including the comparison between phases [30]. Microbial growth occurs in four stages, as Peleg and Corradini [31] showed in Fig. 2.

Fig. 2
A profile plots log base 10 cells versus time. A line begins along the y axis, remains constant for the lag phase, and increases gradually for the log phase for a certain time period. Then it remains stable for the stationary phase and again decreases gradually for the death phase.

Microbial growth curve

Lag phase: This phase can last for an hour or even several days, during which cells are in a state of latency growth and adjusting the metabolism before starting exponential growth. There is little or no cell division during this phase, resulting in no significant increase in the number of cells.

Log phase or exponential growth phase: Following the end of the latency state, a phase of increased metabolic activity begins. During this phase, cells undergo logarithmic cell division, and the generation time becomes constant. The environmental conditions influence this phase.

Stationary phase: This phase corresponds to a period of equilibrium in which the number of cell death equals the number of new cells, and metabolic activity decreases. Decline phase or cell death: In this phase, the number of dead cells exceeds the number of new cells, and this trend may continue until only a small fraction of cells remains, or no cells are present.

It is important to observe the different microbial growth phases when analyzing processes for developing new products or assessing the effectiveness of microbial control methods. This can help use predictive models to estimate the impact of environmental factors and process conditions on microbial behavior [13].

3.2 Mathematical Concepts

Predictive microbiology approaches are based on operational research applications and mathematical modeling [32]. Operational research term, also known as management science, is a scientific approach to solving complex problems, aiming to make the best possible decision, usually in a scenario with limitations, such as the scarcity of resources in a system [33].

Mathematical modeling is an operational research tool to elaborate the decision-making system or to understand a complex and real problem logically and integrally. These models are represented mathematically by equations [34]. Usually, it is an empirically elaborated model (Fig. 3) based on laboratory result observation in a controlled environment, and it aims to predict the behavior of a system to guide decision-making about a problem situation.

Fig. 3
A schematic diagram exhibits the basic structure of a mathematical model. It starts with input, flows to the mathematical function, and results in output.

Schematic representation for model development

For this, it is necessary to understand deeply the variables (see Note 4) involved in the problem and the nonmathematical issues surrounding it, for example, the reasons for the limitations imposed on the model, either because of the search for low-cost production or the impossibility of having a variable with a negative value. Then, it must analyze which resources of reality will be used in constructing the model and which ones should be ignored, aiming at optimizing the system, with a good fit (or with low bias) and its accuracy compared to reality [35]. Modeling helps us understand biological systems better and can help us make predictions about their behavior. A model refers to a mathematical equation or set of equations that represent a biological process, system, or relationship. Such models usually involve several random variables and a combination of variables, constants, or parameters. A model that doesn’t include random variables is referred to as deterministic.

Different types of modeling are available, such as descriptive, prescriptive, optimized, static or dynamic, linear or nonlinear, complete or incomplete, deterministic or stochastic, and network models, each of which can be suitable for different problems and scenarios [34]. A predictive model aims to describe the behavior of a natural phenomenon, raising its variables and how they relate to the process in a deterministic way [13]. Some examples of mathematical models within the microbiological context are inactivation curves, growth of a given microorganism, and heat penetration curves, among others [36]. Table 1 shows the different types of mathematical models that can be applied in predictive microbiology.

Table 1 Description of terminology used for model in predictive microbiology

Linear mathematical models are characterized by presenting a linear function, called an objective function, with independent variables (x1;x2;xn …) related to linear constants (an. xn) that can be both equalities (=) and inequalities (≥ or ≤). Constraint elements are also involved in this system, applied under the conditions of the decision variables. For example, they are determining whether a variable can or cannot have a negative value. Meanwhile, nonlinear models can form a convex, concave function, and others. Also included in this group of mathematical models are functions of zero order, first order, second order, and others, leading to different ways of optimizing and solving the model [37].

System resolution can use various techniques, including simplifying the model, dividing it into subproblems, and simplifying the equations without considering the system’s restrictions. Another critical point inside nonlinear models is the polynomial model. A group is also known as the response surface model (Fig. 4) [38]. The characteristic is that its objective function and constraints are well-defined functions, nonlinear in polynomial format, which does not always occur in all nonlinear mathematical models. One of the defining features of this type of model is that its objective function and constraints are well-defined, nonlinear polynomial functions. This is not always the case with other nonlinear mathematical models. These models can handle problems involving sine, radicals, and logarithmic functions, making them useful in predictive microbiology, where multiple determinations must be made simultaneously [39]. They are classified as secondary and empirical models, often called “black box” models. By using terms of the first, second, third, and fourth-order (quadratic function), it is possible to simultaneously evaluate the effect of multiple variables on the system being researched in the polynomial form [40,41,42]. However, it’s important to note that the function may demonstrate exponential growth as the number of variables increases.

Fig. 4
Three 3 D surface plots of L n of G R versus P, T, and N a C l. a. A plane with a downward parabolic trend in the T and p H axes. b. A plane is plotted in the T and N a C l axes. c. A plane is plotted in the p H and N a C l axes.

The 3D surface plots of growth rate (GR) of Clostridium sporogenes spores affected by (a) temperature (T) and pH, (b) temperature (T) and NaCl, and (c) pH and NaCl of RS models. (Reuse with permission. Dong et al. [38])

Another type of model is the stochastic model, which can be used in consumer brand choice research or microorganism growth studies, taking into account multiple random behavior variables, such as temperature, pH, and storage temperature variability. In addition, it seeks to describe the uncertainty and variability of the microbial response in the system [43].

Finally, network models can form artificial neural networks, representing the interdependence between the factors involved in the system. They are a type of artificial intelligence that has been emerging as a promising tool in treating biological data, for example, developing models to describe microbial growth in specific culture media. After all, artificial neural networks (ANN) can identify multiple parameters of a model (linear or nonlinear) in a discriminative way, organizing them logically for analysis, even coming directly from laboratory results (black box models), which makes decision-making of the problem situation more complete and increasingly data-driven, along with more minor estimated errors [44].

These networks can perform parallel calculations of several functions of the developed model. For this to happen, it is necessary to draw the neural layers and the mathematical tasks between them (whether linear or nonlinear, simple or polynomial) and determine the level of connectivity between one vertex and another [45]. In order to fully comprehend predictive modeling, mathematical and statistical concepts are essential. These concepts encompass variables, optimal and suboptimal solutions, objective function, model constraints, sensitivity analysis, regression analysis, prediction, parameters, and simulation.

Different mathematical models can describe and predict microbial behavior in food systems in predictive microbiology. White box or mechanistic models are based on the underlying mechanisms and physical processes that govern microbial growth and survival, such as nutrient uptake, metabolism, and environmental conditions. These models are typically more complex and require more detailed information about the system. Still, they can provide a deeper understanding of the underlying biological processes and be used to design more effective control strategies. On the other hand, black box models are generated purely from experimental data and do not necessarily rely on a deep understanding of the underlying mechanisms [47]. These models are often more straightforward but may not provide as much insight into the underlying biological processes.

Finally, some intermediate models combine both white box and black box approaches. These models may incorporate some mechanistic knowledge to improve the accuracy and robustness of the model while still relying on experimental data for parameter estimation and validation. Overall, the choice of model type depends on the specific research question, available data, and level of understanding of the system [14].

3.3 Types of Predictive Models

Predictive models are becoming valuable and fast tools in the search for answers to specific problems and can be used for several purposes, i.e., models of growth (see Chapter 4) and microbial death (see Chapter 5) as well as boundary conditions such as growth versus no-growth (see Chapter 6). Predictive models can be utilized to evaluate microbial growth, survival, and inactivation, as well as boundary conditions such as growth versus no-growth. These models can be classified through primary, secondary, and tertiary levels [48].

3.3.1 Primary Models

Primary models measure the microorganism response over time for a single set of conditions, allowing growth and decline curves to be generated. The responses provided by primary models are inactivation and growth rate, delay time, or times of turbidity/toxin formation [49].

3.3.1.1 Growth Models

Primary growth models are divided into sigmoidal and mechanistic functions. Sigmoidal curves describe the effect of biochemical reactions on microbial growth rate [50]. There are two approaches for primary growth models: sigmoidal and mechanistic functions. Sigmoidal curves are used to depict the impact of biochemical reactions on microbial growth rate. The most commonly used models for this are the logistic (Eq. 1) [50] and Gompertz modified (Eq. 2) [51] models. On the other hand, mechanistic approaches reveal that the maximum specific growth rate and the lag depend on the environment. Baranyi and Roberts (Eq. 3) [52] came up with a mathematical model that demonstrates an inversely proportional lag to the maximum specific growth rate:

$$ Y(t)={Y}_0+{Y}_{\mathrm{max}}-\ln \left\{\exp \left({Y}_0\right)+\left[\exp \left({Y}_{\mathrm{max}}\right)-\exp \left({Y}_0\right)\right]\exp \left(-{\mu}_{\mathrm{max}}t\right)\right\} $$
(1)
$$ Y(t)={Y}_0+\left({Y}_{\mathrm{max}}-{Y}_0\right)\exp \left\{-\exp \left[\frac{2.71{\mu}_{\mathrm{max}}\left({t}_{\mathrm{lag}}-t\right)}{Y_{\mathrm{max}}-{Y}_0}+1\right]\right\} $$
(2)
$$ \left\{\begin{array}{l}Y(t)={Y}_0+{\mu}_{\mathrm{max}}A(t)-\ln \left[1+\frac{\exp \Big({\mu}_{\mathrm{max}}A(t)-1}{\exp \left({Y}_{\mathrm{max}}-{Y}_0\right)}\right]\\ {}A(t)=t+\frac{1}{\mu_{\mathrm{max}}}\ln \left[\exp \left(-{\mu}_{\mathrm{max}}t\right)+\exp \left(-{\mu}_{\mathrm{max}}{t}_{\mathrm{lag}}\right)-\exp \left(-{\mu}_{\mathrm{max}}t-{\mu}_{\mathrm{max}}{t}_{\mathrm{lag}}\right)\right]\end{array}\right. $$
(3)

where Ymax is maximum population; Y0 is initial population; μmax is maximum growth rate; and tlag is the lag time.

3.3.1.2 Inactivation Models

Predictive models can be used to describe microbial kinetics inactivation resulting from thermal or nonthermal processes. Microbial populations are typically measured at discrete time points to construct an inactivation curve. The resulting survival curve may show linear or nonlinear behavior.

Primary inactivation models allow us to estimate inactivation parameters such as inactivation rate (kmax), shoulder length (SI), and tail formation (residual population) (yres). Regarding describing inactivation behavior, several authors developed equations for this purpose. The linear model [8] (Eq. 4) was the first inactivation model to apply fit inactivation kinetics. Nonlinear models with shoulder and tail [53] (Eq. 5) describe a lag time before the inactivation and a residual population after the treatment, tail region. Further, the concave and convex curves, due to biological variations in inactivation, may be expressed as a statistical model of the distribution of inactivation times, Weibull model [41] (Eq. 6):

$$ y(t)={y}_0-\frac{t}{D} $$
(4)

where y0 is initial population and D is decimal reduction time.

$$ y(t)={y}_{\mathrm{res}}+\log \left[\frac{\left({10}^{y_0-{y}_{\mathrm{res}}}-1\right)\ \exp \left({k}_{\mathrm{max}}{S}_{\mathrm{l}}\right)}{\exp \left({k}_{\mathrm{max}}t\right)+\exp \left({k}_{\mathrm{max}}{S}_{\mathrm{l}}\right)-1}+1\right] $$
(5)

where yres is residual population, y0 is initial population, kmax is inactivation rate, and Sl is shoulder length.

$$ y(t)={y}_0-{\left(\frac{t}{\delta}\right)}^p $$
(6)

where y0 is initial population, δ is time for first decimal reduction, and p is curvature parameter.

3.3.2 Secondary Models

Secondary models describe the effects of multiple variables on microbial behavior in food. These models are based on the primary models, which describe microbial growth, survival, or inactivation as a function of environmental factors such as temperature, pH, and water activity. Secondary models consider additional factors influencing microbial behavior, such as other microorganisms, food components, and chemical or physical treatments. Examples of secondary models include competitive growth models, which describe the growth of multiple microorganisms in a food matrix, and hurdle models, which represent the effect of numerous preservation factors on microbial growth or survival. Other secondary models include models for the impact of preservatives, thermal processing, and modified atmosphere packaging on microbial growth. These models can approach kinetics or probability description for the influence of intrinsic and extrinsic factors on either microbial growth, survival, or inactivation [49]. Probability models are developed through regression analysis to estimate the effects and interactions of independent variables, allowing for the assessment of the probability of toxin production, spore germination, or microbial growth boundaries. Meanwhile, kinetic models use mathematical functions, such as linear or nonlinear regressions and polynomial regressions, to assess the impact of environmental conditions on microbial kinetic parameters, i.e., inactivation or growth rate, lag phase duration, and maximum population density. Polynomial functions (Eq. 7) are widely employed in predictive microbiology to model the effects of environmental factors simultaneously [54]. This approach is commonly utilized to develop the increasingly popular square root [55] (Eq. 8) and cardinal parameter-type models [56] (Eq. 9). These models individually consider each environmental factor and then create a general model that describes their combined effects:

$$ y={\beta}_0+\sum \limits_{j=1}^k{\beta}_j{X}_j+\sum \limits_{j=1}^k{\beta}_{jj}{X}_j^2+\sum \limits_{j\ne 1}^k{\beta}_{jl}{X}_j{X}_l+\varepsilon $$
(7)

where β0, βj, βjj, βjl are the estimated coefficient regression, Xj and Xl are independent variables, and ε is error.

$$ \sqrt{\mu_{\mathrm{max}}}=b\ \left(T-{T}_{\mathrm{min}}\right) $$
(8)

where Tmin is the minimum temperature below which the maximum growth rate is equal to 0 and obtained through a linear regression of the square root of the maximum growth rate temperature.

$$ {\mathrm{CM}}_n(X)=\left\{\begin{array}{ll}0,& X\le {X}_{\mathrm{min}}\\ {}\frac{\left(X-{X}_{\mathrm{max}}\right)\left(X-{X}_{\mathrm{min}}\right)}{{\left({X}_{\mathrm{opt}}-{X}_{\mathrm{min}}\right)}^{n-1}\left[\left({X}_{\mathrm{opt}}-{X}_{\mathrm{min}}\right)\left(X-{X}_{\mathrm{opt}}\right)-\left({X}_{\mathrm{opt}}-{X}_{\mathrm{max}}\right)\left(\left(n-1\right){X}_{\mathrm{opt}}+{X}_{\mathrm{min}}- nX\right)\right]},& {X}_{\mathrm{min}}<X<{X}_{\mathrm{max}}\\ {}0& X\ge {X}_{\mathrm{max}}\end{array}\right. $$
(9)

where X is temperature, pH, or aw; Xmin and Xmax are values of X below and above which no-growth occurs, respectively; Xopt is the value X at which microbial growth is optimum; and n is a shape parameter. In optimal condition de CMn(Xopt), the value is equal 1 and for Xmin and Xmax the CMn is equal to 0 [57].

3.3.3 Tertiary Models

Finally, the tertiary level refers to computational tools that integrate primary and secondary models in friendly interfaces [58]. The tertiary level of predictive microbiology involves the integration of primary and secondary models into computational tools that provide user-friendly interfaces for predicting microbial responses. One of the primary challenges in predictive microbiology is managing large amounts of data and retrieving specific information efficiently. Decision-support tools, such as computer software programs, can help to address these challenges by providing users with a simplified visualization of model inputs and outputs through graphical interfaces. However, the development and application of information management systems in predictive microbiology have been limited compared to other scientific fields. This highlights the need for continued investment and development to ensure the predictive models are accurate and up-to-date. Overall, the tertiary level of predictive microbiology plays a critical role in advancing our understanding of microbial behavior and improving food safety. The development and integration of advanced computational tools may lead to play a critical role in improving our ability to predict and manage the behavior of microorganisms in a wide range of applications [59].

3.3.3.1 Types of Programing Languages, Tools, and Software Available to Predictive Microbiology

Tertiary models have become an important, if not indispensable, tool for elaborating and resolving complex problems related to assessing microbial behavior. Considering this context, software are becoming more common and enable more robust data analysis by considering a greater number of variables. It is also possible to use more comprehensive tools for modeling, such as Microsoft Excel and programming languages like VBA, R, and Python. However, providing a user-friendly interface for estimating microbial parameters is more beneficial for food processors, regulators, and the general public [14]. Regarding food safety management and assistance in decision-making, tertiary models provide insights into achieving food safety management measures in a short timeframe. Thus, different databases, software, and add-ins have been developed for predictive microbiology in the last 20 years. Table 2 presents a survey of characteristics and applications of some devices used in predictive microbiology.

Table 2 Tertiary models for predictive microbiology

4 How to Design to Predictive Microbiology: Collect Data and Design Experiment

Developing a predictive model requires careful planning and execution of the experimental design for evaluating the effect of intrinsic and extrinsic factors. A good experimental design ensures high-quality and reliable results from the data gathering. In addition, adequate planning and the organization of the experiment are crucial for ensuring the collected data has the necessary quality and microbial behavior to be studied [75].

During planning stage, the model should be designed to consider several parameters such as the central hypothesis of the study, the food matrix and its particularities, the target microorganism behavior in this matrix, the adequate volume of data to support the model, and the necessary resources [76].

Environmental factors that affect microbial behavior and how their interaction occurs should also be considered during the design process. Thus, it may lead to increase the chances of extracting the desired information and reduce the probability of excessive experimental work due to failure at the end of the process [77].

The experimental design should control and adapt the factors according to the design objectives and the hypothesis it aims to test. The physiological state of inoculum cells, substrate or culture medium needed, dependent variables proposed for the model, combinations of factors that may be included in the model, and how collected data are carried out are all important factors to consider during the designing process [76, 78].

The factors that must be controlled must be planned and adapted according to the experimental design objectives and the hypothesis being tested. Within the design process, it is essential to consider other specific details such as the physiological state of inoculum cells, the substrate or coculture medium required, the dependent variables proposed for the model, combinations of factors to be included in the model, and the method of data collection. The factors to be considered are shown in Fig. 5 [77].

Fig. 5
A flow diagram of the stages to develop predictive model. It comprises planning, experimental, and data handling stages. It starts with define goals, followed by define dependent variables, independent variables, prepare the inoculum, control factors, microbial data, and ends at validate the model.

Stepwise to develop a predictive model

For the experimental design, a procedure containing two stages can be applied, as suggested by [14] (i) performing screening experiments in a wide range of factors and (ii) conducting a data collection study within the region of interest, including additional levels of factors that result in a more refined model and more accurate forecasts. Thus, experimental designs are used in predictive modeling to “reduce the number of experiments,” making it an important stage for the development of models.

The collected data and the number of points needed to represent microbial behavior are crucial to reducing the uncertainties related to this behavior. Thus, the distribution of the collected points within the experimental design becomes fundamental for estimating the dependent variables. As a result, the representativeness of the model increases, and there is a reduction in the variance of the estimated parameters.

Experimental designs are used to model microbial responses in foods. The advantages of using practical methods correspond to the “ease of implementation together with its data processing.” In addition, all combinations are explored, as all information will be obtained from the experiment, making it an easy tool to handle statistically. It is possible to develop the experimental design through complete factorial designs, fractional factorial designs, or central composite designs, depending on the number of variables in the experiment [77].

4.1 Design Experiment

4.1.1 Complete Factorial Design

The complete factorial design allows a complete investigation of all combinations of different variables, enabling direct modeling of interactions, in which it becomes possible to evaluate the environmental factors that influence the growth and inactivation of microorganisms. The complete factorial design has significant advantages, as it is easy to implement and process your data, as all combinations are explored. Nevertheless, this design has the disadvantage that a new factor/level is added to the experiment. There will be a significant increase in the number of experiments, which can become laborious and expensive [79]. The number of experiments that will be performed can be calculated as 2k, where k corresponds to the number of independent variables in the experiment [80].

4.1.2 Fractional Factorial Design

Fractional factorial designs are commonly used in situations where there are a large number of independent variables, as they help to reduce the number of experiments required. These designs are based on prior knowledge or assumptions about the most important factors or expected interactions. However, they can be more complex to develop than complete factorial designs, as the different parameters that should be included in the experimental plan must be carefully considered [81].

In order to define the matrix of a combination of levels of independent variables, statistical software is often necessary. One type of fractional factorial experiment is the Box-Behnken design, which combines two-level factorial experiments with balanced incomplete blocks and includes additional experiments in the central points of the design to test the repeatability of the adopted model design. Another type of fractional factorial experiment is the factorial Latin square design, which corresponds to a Latin square of order x, an arrangement of x letters in an x by x matrix. In this design, each letter appears once in each line and column inside a square block or field [81, 82].

For experiments in predictive microbiology, the Latin square design typically corresponds to a primary factor known as “treatment,” represented by randomly distributed letters appearing only once in a row or column. This design is useful when several blocking factors, called “annoyances,” need to be controlled, even though the approach is not limited to the main factor of interest. The Latin square design corresponds to an example of an incomplete block. It can be extended to more individual factors, used as the Greco-Latin square or the Hyper Graeco-Latin designs [82, 83].

4.1.3 Central Composite Design

The central composite design is commonly used for predictive food modeling experiments and involves a complete factorial experiment with a limited number of levels per environmental factor, making it suitable for studying simpler model structures. This design includes a complete factorial experiment with a set of central points and two axial points on the axis of each design variable at a distance of a from the design center, as described by Eq. 10. The number of experiments for k variables can be determined using the equation, with n0 representing the number of experiments at the central point. Additionally, a set of “star points” is included to estimate curvature and enhance the central points [84]:

$$ {2}^k+2k+{n}_0,\left({n}_0\ge 1\right) $$
(10)

where k is the number of independent variables and n0 is the number of experiments of central point.

4.1.4 Doehlert Matrix

The Doehlert matrix is a type of experimental design that is characterized by points that are uniformly spaced on concentric spherical shells. This design is also known as a uniform shell design, and it aims to fill the experimental space uniformly [85]. The number of designs is determined by the number of variables, k, and the number of center points, n0, according to the formula k2 + k + n0. For instance, if n0 = 1 and k = 4, there will be 21 experiments. The center experiment can be repeated multiple times to estimate the observed variance, which can be used for model validation. One of the advantages of the Doehlert matrix is that it can be easily expanded by adding new variables or increasing the range of the tested parameters, providing flexibility for the experimental design [86].

4.2 Growth Matrices and Microbial Strain

Predictive modeling in microbiology requires an understanding of the unique characteristics of the matrix that provides conditions for microbial growth. The behavior of microorganisms can be explained through significant environmental factors such as temperature, pH, and water activity but also by structural compositions of the matrix. This fact is essential to consider in the development of predictive models. For example, microbial growth occurs more rapidly in a liquid medium than in food media [87]. Thus, when comparing predictions generated from liquid matrices with observations for food ones, it is necessary to consider the potential factors that may result in divergences. These factors may include the presence of native microbiota or additional environmental factors in food matrices that were not included in the models. For detailed information, see Chapter 3.

An appropriate microbial strain is critical in developing a reliable and accurate predictive model. Firstly, the chosen strain should be representative of the microbial population of interest in the food matrix and be relevant to the food matrix and its storage conditions. The selection encompasses the origin, including different strains of foodborne pathogens, physiology, and growth characteristics of the strain. Depending on the study purpose, strains or cocktails can be used. This allows the assessment of the variability related to resistance to stress, behaviors, and the influence of lineage variability given to changes in the environment between different lineages of the same species, thus increasing the representativeness of the situation found in the analyzed food [88]. Alternatively, surrogate microorganisms can be used instead of specific pathogens when the pathogen is not indicated. The surrogate should have similar behavior and kinetics characteristics, except for virulence, similar to the target pathogens [89].

The microorganisms, vegetative cells, or spores selected for the study must be incubated in standardized conditions, preferably similar to those found in food. Also, inoculum preparation is a crucial step for predictive model development. For more information about the selection and inoculum preparation and method of inoculation, refer to Chapter 2.

4.3 Goodness Index Fit and Model Validation

After elaborating and refining the mathematical model, it is necessary to test it to verify its accuracy in the face of reality, being the last step of the modeling cycle [90].

Validation methods can be divided into graphical and mathematical methods. The first one allows quick visualization and interpretation of the objective data and is characterized by creating a graph where the expected values according to the model and the values acquired in a laboratory way are plotted [91, 92]. Mathematical validation implies the calculation of statistical factors, for example, mean squared error, accuracy factors, and bias, among others [13, 93, 94].

After performing the curve fitting, the model must be validated using statistical indexes, constantly evaluating which indexes best apply to each type of model, taking into account its characteristics, such as whether the predictive model is a linear function or not [95]. For more information, see Chapter 10 for model validation.

4.3.1 Goodness Fit Index

The goodness fit indexes are used to evaluate the fit of a model. Some metrics, such as R2, RSS, and RSME, can provide different information about the accuracy and precision of the model’s predictions [96].

4.3.1.1 R-Squared

R-squared (R2) is a statistical measure that indicates how well the model fits the data, that is, it evaluates the performance of a security or fund (dependent variable) in relation to a given reference index (independent variable) (Eq. 11). R2 adjusted (Eq. 12) determines the extent of variance in the dependent variable that can be explained by the separate variable:

$$ {R}^2=\frac{\sum \limits_{i=1}^n{\left({\hat{y}}_i-\frac{1}{n}\sum \limits_{i=1}^n{y}_i\right)}^2}{\sum \limits_{i=1}^n{\left({y}_i-\frac{1}{n}\sum \limits_{i=1}^n{y}_i\right)}^2} $$
(11)
$$ {R}_{\mathrm{adj}}^2=1-\left(1-{R}^2\right)\frac{n-1}{n-k-1} $$
(12)

where n is the number of observations, yi is the observed value for sample i, \( {\hat{y}}_i \) is the value of the prediction for sample i, k is the number of parameters, and n is the number of sample data.

4.3.1.2 RSS

Residual sum of squares (RSS), also referred to as the sum of squared errors (SSE), is a statistical method that indicates whether the regression model fits the actual dataset well or not. It measures the variance of the value of the observed data when compared to its value predicted by the regression model (Eq. 13):

$$ \mathrm{RSS}=\sum \limits_{i=1}^n{\left({y}_i-\hat{y}\right)}^2 $$
(13)

where yi is the observed variable value and \( \hat{y} \) is the value estimated by regression line.

4.3.1.3 RMSE

RMSE (root mean squared error) is a measure of absolute error that squares the deviations to prevent positive and negative deviations from canceling out. This measure also tends to overestimate large errors, which can help weed out methods with these errors (Eq. 14):

$$ \mathrm{RMSE}=\sqrt{\frac{\mathrm{RSS}}{n-k}} $$
(14)

where n is the number of samples and k is the number of parameters.

5 Limitation

Although microbial predictive models have brought significant advancements to food safety, addressing some limitations associated with their development and application is essential. Adequate knowledge of the initial conditions, understanding the challenges encountered during the study of food matrices and production processes, and considering the variability of the outcomes are crucial for developing reliable predictive models. It is essential to assess the relevance of the model system for the specific food of interest and the potential limitations during data acquisition and the modeling process. Failure to address these limitations can render the execution of the modeling unfeasible and result in unsatisfactory results [97].

6 Notes

  1. 1.

    Extrinsic Factors: These factors are related to the environmental conditions around the analyzed sample, also known as environmental limitation factors. Some examples are the relative humidity, temperature, and gaseous composition of the environment in which this sample is inserted.

  2. 2.

    Intrinsic Factors: Intrinsic factors are those related to the sample in which the microorganism is found, thus called substrate limitations. Some examples of this type of factor are composition and availability of nutrients in the medium, pH, redox potential (Eh), barriers and components with antimicrobial action, as well as the water activity of the substrate.

  3. 3.

    Parameters: They can be either a value related to a quantity (as the flow and volume parameters are related to the magnitude of a liquid, or the average parameter, which is related to the population quantity) or a variable that can take different values and, as a consequence, change other values of an equation.

  4. 4.

    Variables: Variables are generally represented by a letter, which differs from the preestablished values of an equation. They can be divided into dependent (where there are also interdependent) and nondependent (independent). The first group is related to some independent variable, and this second group can also be called a factor. Dependent variables, also known as decision variables, are under the system modeler’s control and influence.