Introduction

The choice of consumers is influenced by color, especially in food, leading to an increase in the use of artificial colorants by the food industry in the last century [1]. However, several artificial colorants have been banned in industrial products due to their harmful effects on human health [2, 3] and the environment [4]. Thus, there has been an increased interest in colorants from natural sources, which in most cases can be considered safe color additives [5]. According to information from Mordor Intelligence, the market for natural colorants in 2020 was $1,625.79 million [6].

The sources of natural colorants are the most diverse, including plants, insects, and microorganisms, such as bacteria and filamentous fungi [1, 7]. Among filamentous fungi, the genus Monascus is the most reported to produce natural colorants [1, 8]. This genus can produce about 60 secondary metabolites, of which six are the most studied: rubropunctamine and monascorubramine (red); rubropunctamine and monascorubrine (orange); monascine and ankaflavine (yellow) [9]. However, the simultaneous co-production of mycotoxin by Monascus sp. is considered a major concern in food, making toxin detection decisive for consumer safety [10, 11]. Thus, one of the strategies proposed to produce mycotoxin-free colorants would be the use of fungi of the genus Talaromyes/Penicillium [12, 13].

Recently, Oliveira et al. [14] reported that the species Talaromyces amestolkiae can secrete five azafilones compounds, all being complexes of the colorants amino-hexanedioic acid and Monascus azafilone and being a great alternative for the production of mycotoxin-free colorants. In another work by Oliveira et al. [15], the authors described the importance of evaluating the technical feasibility and cost for the production of natural colorants as important factors for the replacement of artificial colorants. Like other biocompounds, colorant production and microorganism growth are affected by environmental conditions, nutrient sources, and microbiological conditions [1]. Additional supplements such as nitrogen or carbon sources can also enhance growth and colorant production by microorganisms [16]. Therefore, it would be highly beneficial to predict the productivity of microbial colorants.

To predict the growth and bioactive product accumulation of fungi cultivation systems, mathematical modeling using several optimization techniques has been used as a tool [17,18,19,20]. The simulation of biochemical processes is a technique that can provide important information about the bioprocess. For example, the use of phenomenological kinetic models is well-reported and described for the most appropriate purposes, mainly aiming to better understand the bioprocesses, optimize the process conditions and evaluate the operation mode. The work by Manan et al. [17] fitted an unstructured model based on logistic and Leudeking–Piret equations to the experimental data from batch cultivation. The authors reported a kinetic model of the process, which was suitable to describe growth, substrate consumption and red colorant production by M. purpureus.

In addition to the phenomenological models, the use of empirical models can also provide valuable information, as the influence of conditions process and medium composition about production of Interest metabolites by microorganisms, for the development and optimization of bioprocesses [18]. In this context, the use of statistical methods that generate polynomial models through multiple regression can also provide valuable information about a given bioprocess and it is a tool very explored in the literature to improve the production of biomolecules.

Santos-Ebinuma et al. [19] used a statistical approach though factorial design to evaluate the influence of sucrose and yeast extract, carbon and nitrogen source, respectively, in the production of red colorants by Penicillium purpurogenum. Oliveira et al. [15] evaluated the influence of glucose (carbon source) and monosodium glutamate (nitrogen source) in the culture medium and the initial pH of the process, using statistical analysis techniques to adjust the model and to study the influence of process variables on red colorant production by T. amestolkiae. Zhou et al. [21] used response surface methodology to optimize the culture medium for yellow colorants production by M. anka. Although Response Surface Methodology (RSM) has gained much importance in optimizing colorant production, the biological processes modeling and control require complex system models as they are non-linear and non-deterministic systems [22, 23].

In this sense, Artificial Neural Networks (ANN) are adequate to estimate biological process variables with system modeling, thus estimating different operation modes of the process through its generalization capacity. ANN has the potential to be used in predicting the output variables of a bioprocess and can also provide adjustment points to improve the performance of the process. ANN mimics how a biological brain works with a network of neurons processing information and signals from neighboring neurons [23]. The MATLAB® software is suitable for modeling these systems, which have excellent generalization and data prediction capabilities [24].

ANN has been recognized as a computational tool to model non-linear relationships and it can be used for predictive modeling-related microbial-based colorant products and biomass growth [22, 25, 26]. Information retrieved from the ANN was used to determine the optimal operating conditions for colorant production by M. purpureus using bug damaged wheat meal. The developed ANN had R2 values for training, validation, and testing data sets of 0.993, 0.961, and 0.944, respectively. According to the model, the authors obtained the highest colorant production adjusting the temperature and the agitation speed under light conditions [26]. Singh et al. [25] have investigated the application of ANN in modeling a red colorant production process by M. purpureus and reported that ANN model can be used to predict the effects of cultivation parameters on red colorant production with a high correlation. To the best of our knowledge, no study has been reported so far on the production of red colorant from Penicillium/Talaromyces species using ANN.

In this sense, this work aims to develop two distinct strategies to predict the red colorants production by T. amestolkiae. The response surface methodology was employed to compare the fit obtained using a polynomial model from multiple regression with the fit obtained using the neural network technique. The models were developed by employing experimental data obtained from previous research concerned with studying the effect of cultivation components concentration and pH on the red colorants production by T. amestolkiae [15]. The ANN models thus developed would allow the identification of cultivation parameters which have a higher influence on colorant accumulation. Furthermore, the model with a better fit to the experimental data can be used to optimize the cultivation parameters.

Materials and methods

The experimental data used in this study were obtained by Oliveira et al. [15]. The data from the production of red colorant by T. amestolkiae were used for the adjustment of the polynomial model obtained by multiple regression and for the adjustment of the model obtained by training ANN. For this purpose, two modeling strategies were adopted to be implemented in the model adjustments: (i) a model considering the data achieved using a 23 full factorial design and another for the results achieved using a 22 central composite design [15], as described in Table 1; and (ii) a general model considering all the experimental data described in Table 1. Thus, following this strategy, three ANNs will be trained and their architectures defined to verify which one will present better performance in learning and in simulating the higher colorants production by the microorganism.

Table 1 Experimental data were obtained employing a 23 full factorial design and 22 central composite design for the red colorant production by T. amestolkiae [15]

Model obtained through neural network training

To adjust the model with ANN simulations, the tool “Neural network fitting tool” (nftool) of the MATLAB® software was used, with a graphical interface. The best results for the training of feedforward neural networks were given by the Levemberg–Marquardt algorithm (trainlm) with the implementation of the square sum error (SSE) performance objective function. The Gradient Descent Backpropagation algorithm (traingd) was also tested. In addition, the number of neurons in the intermediate layer, ranging from 2 to 10, and some activation functions were evaluated, using the linear function in the output layer (purelin) and testing in the intermediate layers the log-sigmoid functions (logsig) and the hyperbolic tangent sigmoid (tansig).

To obtain a good generalization of the ANN, the data set was randomly divided into 70% of the samples available for training, 15% for validation and 15% for the test, as is the standard MATLAB® configuration and as most articles that apply ANN do, such as in the work of Jokić et al. [27]. The performance of the neural network, seeking the closest approximation of the model to the experimental data, was considered by analyzing the values of the correlation coefficient (R), determined by the software, and the SSE, described by Eq. 1. Therefore, three neural networks with variations in their architectures were created, two networks considering the strategy (i) (analysis for the designs 22 and 23) and one network considering strategy (ii) (all points):

$$SSE = \sum {\left( {Y_{Calc} - Y_{\exp } } \right)}^2$$
(1)

where Ycalc is the colorant concentration calculated by the model and Yexp is the experimental colorant concentration.

Adjustment of the polynomial model

For the adjustment of the polynomial model by multiple regression the Microsoft® Excel 2021 software was used to adjust the model to the experimental data, as shown in the following equation:

$$Y = \, a + bX_1 + cX_2 + dX_3 + eX_1 X_2 + fX_1 X_3 + gX_2 X_3 + hX_1 X_2 X_3 + iX_1^2 + jX_2^2 + kX_3^2$$
(2)

where X1, X2 and X3 are the evaluated variables (MSG, pH and glucose, respectively); Y is the response variable, namely, red colorant production; a, b, c, d, e, f, g, h, i, j and k are the equation parameters. For the first strategy studied and for the results achieved in the 22 central composite design, the coefficients i, j and k are equal to zero; and for the results obtained in the 23 factorial design the values of d, f, g, h and k are equal to zero. For the second strategy evaluated all the parameters were considered.

To calculate the SSE, it employed Eq. 1. The R was determined through the multiple regression for the adjustments of the polynomial models to the experimental data.

Results and discussion

Adjustment of the ANN training data

A study on the architecture of the neural network, the activation function as well as the training algorithm was carried out to evaluate the best strategy to adjust a neural network with a high capacity to predict the red colorant production, considering as input variables the concentration of glucose and MSG; and pH. In the elaboration of artificial neural networks, different architectures were considered to achieve a better fit model, varying the number of neurons in the intermediate layer, testing two transfer functions in the intermediate layer neurons and evaluating the SSE and R obtained with two learning algorithms: Gradient Descent Backpropagation algorithm (GD) and Levenberg–Marquardt algorithm (LM). Thus, the simulations with variations listed in Table 2, were carried out, with the results obtained and the comparisons.

Table 2 Artifical neural  network simulations showing the architecture in the general model and their respective parameter variations

Assessing the effects of the Gradient Descent (traingd) and Levenberg–Marquardt (trainlm) training algorithms, the best optimizer of the weights of the connections between neurons is the LM training algorithm, which obtained the highest efficiency in training the neural network, because the R between the data was higher and the SSE was lower when compared to the values obtained with the Gradient Descent algorithm (Table 2). Likewise, in the evaluation of the transfer function of the intermediate layer with the best contribution to the minimization of the objective function. The log-sigmoid function (logsig) (Eq. 3) was better suited in this model when compared to the hyperbolic tangent sigmoid function (tansig) (Eq. 4), since the output points fit in the best way with the experimental data and, consequently, generated fewer errors, with good performance results:

$$\log sig(N) = 1/(1 + \exp \, ( - N))$$
(3)
$$\tan sig(N) = 2/(1 + \exp \, ( - 2*N)) - 1$$
(4)

where N = Input column arrays, specified as an array.

In relation to the number of neurons in the intermediate layer, two neurons showed the best result in all cases, resulting highest R and lowest SSE in all cases, so it was fixed at two neurons in the intermediate layer for better functioning of the network in relation to learning speed. Considering values of R above 0.9, a network with only two neurons was more suitable of predicting experimental data with good quality.

Following the strategies employed in this study, three neural networks were elaborated according to the previously defined architecture. For strategy (i) an analysis model for the 23 full factorial design and another model for the results achieved using a 22 central composite design (ii) a general model. The architectures of each neural network referring to the implemented analysis model are represented in Fig. 1. In all networks evaluated in this work, the bias was considered for the intermediate and output layers, since they increase or decrease the degree of freedom of the weights adjustments [28].

Fig. 1
figure 1

Schematic representation of ANN models: an analysis model for the 23 full factorial design and mixed (a) and the 22 central composite design (b). Neurons identified as b1 and b2 are the bias of the intermediate and output layers, respectively

The neural network follows the analysis model for the design 22 and is a good fit with the R equal to 0.997 and with a low SSE. Likewise, the neural network of the analysis model for the design 23 obtained R equal to 0.996 and low SSE. Regarding the artificial neural network of the general model, considering all experimental data, it had a great performance, with R equal to 0.984 and SSE equal to 0.024. The adjustments of the artificial neural networks of the designs 22, 23 and general analysis model are shown in Fig. 2.

Fig. 2
figure 2

Representation of the linear regression of the fits of the analysis models: 22 central composite design (a); 23 full factorial design (b); general (c) obtained by simulation method of artificial neural networks in MATLAB software

The results presented in Fig. 2 show that with the artificial neural network it was possible to obtain a model capable of simulating values close to the experimental data, considering the two adjustment strategies. Therefore, the analysis suggests a robust model capable of fitting well to experimental models.

In this study, the architecture of the artificial neural network, the activation function as well as the training algorithm was carried out to evaluate the best strategy to adjust an artificial neural network with a high capacity to predict the red colorant production, considering as input variables the concentration of glucose and MSG; and pH. Thus, it is concluded that an artificial neural network containing two neurons in the hidden layer was good to describe the process. In addition, using the best activation function was logsig to determine the input values for the hidden layer and purelin for the output layer. For neural network training, the best training algorithm was the Levenberg–Marquardt.

Adjustment of the polynomial model

To elaborate the polynomial model following the strategy (i), adjustments were made to two empirical models by multiple regression. The models obtained are described by Eq. 5 (analysis model for the 23 full factorial design) and 6 (analysis model for the 22 central composite design):

$$Y = - 1.079 + 0.122*pH + 0.136*MSG + 0.183*Gly - 0.015*pH*MSG0.0207*pH*Gly + 0.0001*pH*MSG*Gly$$
(5)
$$Y = - 9.151 + 5.057*pH - 0.068*MSG + 0.093*pH*MSG - 0.795*pH^2 - 0.006*MSG^2$$
(6)

With this, it was possible to obtain results similar to those achieved by Oliveira et al. [15], achieving a R equal to 0.998 and an SSE equal to 0.012 for the analysis model of the design 23 full factorial design. The fit of the 22 central composite design data model obtained a R equal to 0.965 and SSE of 0.594.

Adopting the strategy (ii), an empirical polynomial model was fitted to the data from Oliveira et al. [12]. Equation 7 shows the equation adjusted to the experimental data:

$$Y = - 8.722 + 2.530*pH + 0.122*MSG - 0.003*Gly - 0.015*pH*MSG - 0.04017*pH*Gly + 0.0007*pH*MSG*Gly - 0.172*pH^2 - 0.0005*MSG^2 + 0.032*Gly^2$$
(7)

The polynomial model gave a high SSE equal to 4.039 and a R equal to 0.871. Such values show the lowest correlation and the greatest difference between experimental and simulated data in all evaluated tests.

Figure 3 shows the comparison between the experimental data and the results obtained by the polynomial models.

Fig. 3
figure 3

Representation comparing the experimental data and the ones achieved by the simulation method of artificial neural networks in MATLAB for the 23 full factorial design (a); 2.2 central composite design (b); all the results (c)

Thus, the polynomial model obtained through multiple regression presented a good fit when considering the strategy (i). However, when considering the strategy (ii), the model presented a fit with the worst fit, reflecting the lowest correlation coefficient as well as the largest SSE obtained. Thus, the polynomial model presented a good fit when the strategy (i) was used, but it was not possible to obtain a good fit of the model to the experimental data using the strategy (ii). For the statistical analysis of the results, Oliveira et al. [15] analyzed the 23 full factorial design, which considered the concentration of MSG and glucose; and pH; and then the 22 central composite design, which considered the concentrations of MSG and the pH. Therefore, the fact that glucose was not considered in the second set of experiments must have contributed to the worst fit using polynomial models through strategy (ii).

Models analysis

Table 3 shows the R and SSE determined by adjust of the models evaluated to the experimental data described by Oliveira et al. [15]. For strategy (i) for the designs 23 and 22 the R are low, above 0.9, low values were also obtained for SSE in both cases: polynomial model (PM) and ANN models.

Table 3 Results of polynomial adjustments (PM) and training of artificial neural networks. (ANN)

However, when analyzing the results of the strategy (ii), it is noted that only the correlation coefficient obtained through ANN model was above 0.9 and a low SSE. Considering the use of the adjustment of the PM to the experimental data obtained through multiple regression, it presented the worst R, below 0.9, and the highest SSE value.

Comparing the fit obtained through PM´s and the ANN model, it was found that the output values of the ANN are closer to the experimental data in all sets of results evaluated (Figs. 2 and 3), presenting higher value of R and lower value of SSE in all selected artificial neural networks. Such comparison highlights the potential of using artificial neural networks and the architecture used to predict the bioprocess used in this study.

The advantages of using the ANN against the PM is the higher level of prediction accuracy of the results. ANN can be more sensitive in predicting the initial conditions that improve the production of the final product. Other advantage is that the model is not linear and can better understand the variation of the microorganism's metabolism during the production process and, thus, obtain a better prediction of the final production. In addition, the ANN can assist in the prediction and/or optimization of the colorants production or any other bioprocesses [29].

One of the main problems in the bioprocess industry is being able to follow the production process of a microorganism and ANN has shown itself to be able to predict this production with better precision than the most used model, like the PM. In this way, the trained networks can be used to simulate the biological process to further adequate the initial conditions to obtain a better output result.

Comparing the surface graphs described by the polynomial models (Fig. 4a, c and e) and the graphs obtained using artificial neural networks (Fig. 4b, d and f), it was observed similar trends, but a different profile. In graphs using results simulated by ANN and when compared to the polynomial model, it was observed a lower concentration range of MSG in which colorant production occurs. In relation to the pH, it is observed that the prediction of colorant production by the neural network model was higher than that of the polynomial model. This can be explained by the fact that artificial neural networks are composed of activation functions, which vary from zero to one abruptly, which can help to predict regions in which a given input variable has a greater influence. A polynomial model presents smoother curves, which makes it difficult to predict a region of greater influence of a given variable.

Fig. 4
figure 4

Response surface generated by the empirical model for glucose concentration of 10 g/L of Eq. 1 (data from 23 design) (a); training of artificial neural networks for concentration and glucose of 10 g/L (data from 23 design) (b); empirical model of Eq. 2 (data from 22 design) (c); training of artificial neural networks for the data from 22 design (d); empirical model fitted to all experimental data (e); training of artificial neural networks for the simulation of the model fitted to all experimental data (f)

In relation to the pH, it was also observed the same profile and trend by the results simulated by both models, indicating the use of more alkaline media to produce red colorants. Regarding the simulation of the models adjusted to the data from the design 22, the results simulated by the artificial neural network show that there is a greater pH range in which the greatest colorant production occurs. In this case, the simulated values by the artificial neural network of colorant concentration were from 1.5 to 3.5 uABS and by the polynomial model were 1 to 2.25, considering the best condition of MSG and pH range between 2.5 and 5.4.

Regarding glucose concentration, graphs were not performed, as the effect of glucose was not considered significant in the range evaluated. This result was described by Oliveira et al. [15]. In all strategies used in this work, the ANN models fitted the experimental results better than the polynomial models. Therefore, ANN models can be used for predicting results as well as optimizing process conditions, making it possible to obtain better results than those determined through polynomial models.

Conclusions

In this work, it was possible to conclude that the application of artificial neural network techniques to predict the production of secondary metabolites by filamentous fungi may be a promising technique. The artificial neural network proved to be robust to be adjusted to the experimental data, considering any strategy addressed in this study. The results also show that the red colorant concentration predicted by artificial neural networks are closer to the experimental data for red colorant production than the red colorant concentration predicted by the polynomial models adjusted by the multiple regression method, since they did not present a good fit in strategy (ii), in which all results were considered.

It is also noteworthy that the number of experimental results used to adjust the polynomial models and the ANN were the same, so there was no need for a greater number of experimental results than those already predicted by the experimental designs. Considering the two strategies evaluated, a simple ANN, containing from two to three input neurons (MSG and glucose concentration; and pH) and two neurons in the intermediate layer was able to adjust well to the experimental results of the red colorant production bioprocesses. Thus, this work suggests that the use of ANN for the optimization of experimental conditions can be a tool to be used, since the predicted values are closer to the experimental ones.