Introduction

Nitrogen is one of the main factors limiting the growth and development of plants among nutrients. Many essential compounds for the life of the plant are composed of this element, such as proteins, nucleic acids and chlorophyll, among others. Inorganic nitrogen, such as nitrate and ammonium, is incorporated into organic compounds that are used by the plant cell. In typical regeneration media, nitrogen is present in a greater proportion in the ionic forms of ammonium and nitrate (Ramage and Williams 2002) The optimal total nitrogen concentration (TNC) and nitrate: ammonium rate (NO3/NH4) in the medium depend on the type of tissue, the genetic material and the incubation conditions (Poothong and Reed 2016).

To understand the effect of a factor on a process it is necessary to create a model that allows predicting the value of the response from values of the independent variables. Multiple regression analysis (MRA) is a statistical technique for estimating the relationship between a dependent variable and independent variables, and formulates a linear relation equation between these variables (Uyanik and Guler 2013).

Another option that has been little used to model in vitro processes in plants are artificial neural networks (ANNs). It is a non-parametric technique, so it does not require assumptions such as normality and linearity, being able to detect non-linear effects that with other statistical techniques such as multiple regression could not be determined.

ANNs are computational models that manage to obtain and store information from processing units (artificial neurons) with multiple interconnections (Da Silva et al. 2017). Among the types of neural networks, there is a model called multilayer perceptrons that is characterized because the interconnection of neurons is created by feedback trained with the backpropagation algorithm. These networks learn to transform input data (independent variables) in a given response (dependent variable) (Panchal et al. 2011). The input layer is composed of input variables in separate neurons, while the output layer consists of the response variables. Between the input and output layer are the layers called hidden layers, which contain a variable number of interconnected neurons and a constant neuron related to the intercept synapses, which are not directly influenced by any input variable (Günther and Fritsch 2010).

The aim of this work was to compare the predictive capacity of the models obtained by multiple linear regressions and artificial neural networks on Pinus taeda in vitro organogenesis processes.

Materials and methods

Vegetal material consisted in mature zygotic embryos collected and isolated from clonal seed orchard of Pinus taeda L. (Livingston Parish) located at the geographical coordinates 27°59′ 0.4′S 55 58′ 6′W. Half strength Murashige and Skoog (1962) semisolid (agar 6.5 g L−1) medium with different TNC (5, 10, 20 and 30 mM) and NO3/NH4 (0.5, 1, 2 and 3) (Table 1), and supplemented with sucrose (30 g L−1), thidiazuron (0.45 mM), and 6-benzylaminopurine (0.44 mM) was used.

Table 1 NO3/NH4, TNC tested and salt concentration in the treatments

The isolated mature embryos were incubated under light (116 µmol m−2 s−1 PPFD, 14 h photoperiod) and temperature (27 ± 2 °C) controlled conditions. After 35 days of incubation the regeneration rate, oxidation rate, callus proliferation, number of buds originated per explant and bud-forming capacity (BFC) index were measured. BFC index was calculated as follows:

$$BFC\, index\,=\,\frac{{Regeneration rate\, \times \,number\,of\,buds\,per\,explant}}{{100}}$$

A factorial experimental design completely randomized with 16 treatments, five repetitions and an experimental unit of ten explants was used. MRA and ANNs were used to modeling the nitrogen effect, three repetitions were used for modeling and two to test the model. R 3.0.2 program (R Core Team 2013) with “MASS” (Venable and Ripley 2002) and “rsm” (Lenth 2009) packages were used for MRA and “neuralnet” (Fritsch et al. 2016) to build a multilayer perceptron neural network. In MRA, starting from a cubic model (formula 1), it was simplified using only the statistically significant terms (p < 0.05), when NAR is the NO3/NH4 and TNC is the total nitrogen concentration in mM. Box-Cox transformations were applied to all the variables because they did not comply with the assumptions for parametric analysis (normal distribution and homogeneous variance) (Box and Cox 1964).

$${\text{Response}}\,=\,{\text{cte}}\,+\,{{{\upbeta}}}1({\text{NAR}})\,+\,{{{\upbeta}}}2({\text{TNC}})\,+\,{{{\upbeta}}}3({\text{NAR:TNC}})\,+\,{{{\upbeta}}}4({\text{NAR}}^2)\,+\,{{{\upbeta}}}5({\text{TNC}}^2)\,+\,{{{\upbeta}}}6({\text{NAR}}^2:{\text{TNC}})+{{{\upbeta}}}7({\text{NAR:TNC}}^2)\,+\,{{{\upbeta}}}8({\text{NAR}}^3)\,+\,{{{\upbeta}}}9({\text{TNC}}^3)$$

Residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots were used to diagnose the nature of the variables, such as normal distribution, homogeneity of variance and linearity.

Neural networks were made with a layer input with two neurons (one per independent variable), a hidden layer with three neurons and a neuron output (the response variable that was modeled) (Fig. 1). Previously the variables were transformed to a scale of between 0 for the minimum values and 1 for the maximum values.

Fig. 1
figure 1

Artificial neural network model and its components

To evaluate the predictive capacity of the models was used the coefficient of determination (R2), Pearson’s correlation coefficient (r) and the root mean square error (RMSE) using values that were not used to generate the models, which were calculated using the following formulas:

$${{\text{R}}^2}=1 - \frac{{\mathop \sum \nolimits_{{{\text{i}}=1}}^{{\text{n}}} {{\left( {{{\text{Y}}_{\text{i}}} - {{\text{Y}}_{\text{p}}}} \right)}^2}}}{{\mathop \sum \nolimits_{{{\text{i}}=1}}^{{\text{n}}} {{\left( {{{\text{Y}}_{\text{i}}} - {{\overline {{\text{Y}}} }_{\text{i}}}} \right)}^2}}}$$
$${\text{r}} = \frac{{{\text{n}}\cdot\sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{Y}}_{{\text{i}}} } \cdot{\text{Y}}_{{\text{p}}} - \sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{Y}}_{{\text{i}}} } \cdot\sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{Y}}_{{\text{p}}} } }}{{\sqrt {\left[ {{\text{n}}\cdot\sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{Y}}_{{\text{i}}} ^{2} } - \left( {\sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{Y}}_{{\text{i}}} } } \right)^{2} } \right]} \cdot \left[ {{\text{n}}\cdot\sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{Y}}_{{\text{p}}} ^{2} } - \left( {\sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{Y}}_{{\text{p}}} } } \right)^{2} } \right]}}$$
$${\text{RMSE}}=\sqrt {\frac{{\mathop \sum \nolimits_{{{\text{i}}=1}}^{{\text{n}}} {{\left( {{{\text{Y}}_{\text{i}}} - {{\text{Y}}_{\text{p}}}} \right)}^2}}}{{\text{n}}}}$$

where Yi are the experimental values to evaluate the model, Yp is the corresponding data predicted, \({\overline {{\text{Y}}} _{\text{i}}}\) is the mean value of experimental data and n is the number of the experimental data.

Results and discussion

Both the TNC and the NO3/NH4 influenced the morphogenetic responses (Fig. 2). The numbers represent the treatments described in Table 1. It was observed that treatments with low TNC and high NO3/NH4 favored the formation of calluses, while high TNC and NO3/NH4 decreased the regeneration of the buds. Influences of TNC and NO3/NH4 on the induction and differentiation of plant cell cultures have been reported for some in vitro systems (Kovalchuk et al. 2018; Poothong and Reed 2016; Wada and Reed 2015).

Fig. 2
figure 2

Different type of explant response after 35 days of incubation. The illustrations are disposed of according to the number of treatment. In all cases, bars indicated 2 mm

Table 2 shows the regression coefficients, significance (based on a t-test), determination coefficients (R2) and adjusted R2 for the models obtained by MRA.

Table 2 Regression coefficients, coefficients of determination (R2) and adjusted R2 (adj. R2) for models developed by MRA

Optimal buds regeneration was in the range of 10–20 mM of TNC and 1–2 of NO3/NH4 in MRA and ANNs models (Figs. 3, 4).

Fig. 3
figure 3

Contour graph and basic diagnostic plots for regeneration rate obtained by MRA

Fig. 4
figure 4

Plot of neural networks including trained synaptic weights and the contour graph of the regeneration rate obtained by ANNs

Higher concentrations of nitrogen produced an increase in the oxidation rate (Figs. 5, 6), while the low NO3/NH4 favored the formation of calluses (Figs. 7, 8), to the detriment of the production of buds.

Fig. 5
figure 5

Contour graph and basic diagnostic plots for oxidation rate obtained by MRA

Fig. 6
figure 6

Plot of neural networks including trained synaptic weights and the contour graph of the oxidation rate obtained by ANNs

Fig. 7
figure 7

Contour graph and basic diagnostic plots for callus proliferation rate obtained by MRA

Fig. 8
figure 8

Plot of neural networks including trained synaptic weights and the contour graph of the callus proliferation rate obtained by ANNs

The number of buds per explant was greater in the range of 1–2 of NO3/NH4 in both models, and with TNC between 20 and 30 mM for MRA, while for ANNs it was greater between 10 and 20 mM (Figs. 9, 10).

Fig. 9
figure 9

Contour graph and basic diagnostic plots for buds per explant obtained by MRA

Fig. 10
figure 10

Plot of neural networks including trained synaptic weights and the contour graph of the number of buds per explant obtained by ANNs

BFC index was higher in the range of 10–20 mM of TNC and 1–2 of NO3/NH4 respectively in both models (Figs. 11, 12).

Fig. 11
figure 11

Contour graph and basic diagnostic plots for BFC index obtained by MRA

Fig. 12
figure 12

Plot of neural networks including trained synaptic weights and the contour graph of the BFC index obtained by ANNs

The relation NO3/NH4 present in the culture medium affects the activity of the growth regulators, and that the requirement of cytokinins for the meristematic activity is lower when the content of reduced nitrogen is reasonably high (George et al. 2008).

Residual vs fitted plot is used to detect nonlinearity and unequal error variations. Normal quantile–quantile graph (normal Q-Q) is a graphical technique to determine if the variable has a normal distribution. Scale-location plot shows the square root of the standardized residuals as a function of the fitted values. Residuals versus leverage plot help to identify influential data points in the model. Regeneration and oxidation rates transformed completely met the requirements of parametric analysis. This is observed in the residual vs fitted plots which results in a horizontal line close to 0 and in the distribution of the points in the normal Q-Q plot, which means that they have a normal distribution, homogeneity of variances and linearity. Transformed callus proliferation rate, number of buds and BFC index had no normal distribution, homogeneity of variance and linearity.

Table 3 shows the observed values, the values predicted by both models, coefficients of determination (R2), Pearson’s correlation coefficients (r) and root mean square errors (RMSE) for all the evaluated variables obtained with the test values. Models obtained by ANNs for the regeneration, callus proliferation, number of bud per explant and BFC index had high r (> 0.9) and R2 (> 0.8), while the oxidation rate showed a very low R2 for both models (< 0.3). For all the variables evaluated, r and R2 were higher while RMSE was lower in the models obtained by ANNs than those obtained from MRA. R2 is widely used to understand the sources of variation, since it represents the proportion of the variance explained by a given model (Nakagawa et al. 2017). On the other hand, the correlation is a measure of the association between two variables, which can be positive or negative. One of the ways to measure the correlation between variables is through the Pearson correlation coefficient (Emerson 2015). When the values of RMSE are smaller, the greater the prediction capacity of the model, because the difference between the values predicted by the model and the values observed in the experiment is smaller. This indicates that the models obtained by ANNs have better predictive capacity, since the predicted values are closer to those observed. It is also interesting to note that the models for regeneration and oxidation that comply with the assumptions of the parametric analysis have a similar prediction capacity with MRA and ANNs methodology, while the prediction capacity is notably greater in ANNs models for variables with a non-linear nature.

Table 3 Observed values (Yi), predicted by MRA (YMRA), predicted by ANN (YANNs), coefficients of determination (R2), Pearson’s correlation coefficients between the observed and expected values and root mean square errors (RMSE) for all response

Biological processes, such as organogenesis, are non-linear in nature due to their complexity, since they depend on multiple factors and their interactions (Gallego et al. 2011). Various nonparametric analysis were used for mineral optimization in vitro cultures such as Chi-squared automatic interaction detection (CHAID) analysis (Akin et al. 2017), Classification and Regression Tree (CART) analysis (Kovalchuk et al. 2017) and Neurofuzzy logic (Alanagh et al. 2014).

Sarve et al. (2015) compared the prediction capacity of response surface models (RSM) and ANNs for the synthesis of biodiesel from sesame oil, who concluded that ANNs presented better prediction capacity with higher R2, and lower RMSE. Moreover, Astray et al. (2016) compared the models obtained by RSM and the ANN methodology to optimize the production of mixtures of oligosaccharides from sugar beet pulp. The ANNs models improved the RSM models between 5.58 and 61.78%.

Gago et al. (2010) came to the same conclusion using traditional statistical analysis and ANNs methodology in the proliferation of kiwis in vitro. ANNs methodology is easy to use and does not require assumptions such as traditional statistical analysis (regression analysis and ANOVA for example) and allows modeling using a limited number of experiments.

Other advantages offered by ANNs over traditional statistical analysis are the ability to process many types of data at the same time (continuous, discrete, binomial variables) that allows complex models and does not require a specific experimental design allowing the use of data generated previously (Gallego et al. 2011).

In conclusion, both TNC and NO3/NH4 influenced the morphogenetic responses and artificial neural network models presented a better precision to predict the different responses, with higher coefficients of determination and correlation. They also presented a lower root mean square error for all the variables studied.