1 Introduction

Leucine, isoleucine, and valine have similar structures and are commonly referred to as the branched-chain amino acids (BCAA). Antagonism among these amino acids has been established in avian species such as chickens and turkeys [1]. The interactions among the BCAA have shown that dietary leucine content influences both valine and isoleucine requirements of broilers. Dietary protein is often the most expensive component of a broiler diet. Reducing dietary protein in an effort to decrease production costs can be accomplished via supplementation with crystalline amino acids such as lysine, methionine, and threonine. In low protein diets, adequate dietary valine is critical for supporting optimal growth, feed conversion, and carcass traits [2].

Mathematical modeling stands as a powerful tool to explore information further and to orient future research programs in animal nutrition. Among different mathematical approaches, neural networks (NN) have become popular for predicting and forecasting in a broad area of sciences [3]. Similar to the physiological nervous system, a NN model can ‘learn’ and therefore be trained to find solutions, recognize patterns, classify data, and forecast future events [4].

A multilayer perceptron (MLP) network, sometimes called a back-propagation (BP) network, is probably the most popular NN for nonlinear mapping and has been referred to as a ‘universal approximator’ [5]. It consists of an input layer, one (or more) hidden layer and an output layer. The network needs to be trained using a training algorithm (e.g., back propagation, cascade correlation, conjugate gradient). In radial basis function (RBF) networks, as the name implies, a radially symmetric basis function is used as an activation function for the hidden nodes. Training of the network parameters (weights) between the hidden and output layers occurs in a supervised fashion based on target outputs [6]. Despite frequent reports of successful applications of NN for modeling purposes, concerns still exist about their construction and exploitation. The main concern is about network structure. What the optimal NN structure is and how it can be determined are still issues requiring further investigation.

In recent years, the support vector machine (SVM) has been introduced as a new technique for solving a variety of learning, classification, and prediction problems [7]. Support vector regression (SVR), the regression version of SVM, has been developed recently to estimate regression relationships. As with SVM, SVR is capable of solving nonlinear problems using kernel functions [3]. SVR is a nonparametric statistical learning technique in which no assumption is made about the underlying data distribution [7, 8]. Due to its capability to generalize, SVR has attracted the attention of many researchers and has been applied successfully in a range of disciplines [9].

This study set out to predict the response of broiler chickens [average daily gain (ADG), feed efficiency (FE), and feed intake (FI)] to different levels of dietary protein and BCAA. For this purpose, SVR response models with different kernel functions were developed and their predictive performance compared with that of MLP and RBF types of NN model. The reader is referred to two recent textbooks for review of the pros and cons of the two approaches [10, 11].

2 Materials and methods

This study was carried out in four stages: (i) compilation of data required for training and testing; (ii) training of the different SVR and NN models; (iii) testing the models generated with a data set not used in training; and (iv) assessing their performance and comparing the predictive ability of SVR and NN models.

2.1 Data source

Seven data sets were extracted from the literature, providing 241 data lines (pens of birds, average number of birds per pen = 22). The input variables were dietary protein, valine, isoleucine, and leucine all expressed as g/kg of feed, and the output variables were ADG (g/bird per day), FE (g gain to g feed intake), and FI (g/bird per day). The data were selected based on the following criteria: (i) peer-reviewed published papers were used; (ii) the data were collected from studies in which the BCAA concentrations in the diets were used as treatments; (iii) the dietary levels of protein and investigated amino acids were clearly defined; (iv) the ADG, FE, and FI were reported or could be calculated from the published data; and (v) all experimental data were collected from studies conducted during the first 21 days of age of broiler chickens. A brief description of the seven data sources used in this study is given in Table 1. The complete data set was randomly divided into two groups of training and testing (50% each). The same data were used to train the SVR and NN models, and the models developed were tested with identical data. Therefore, differences in the predictive performance of models were regardless of data. The ranges of data used to develop the NN and SVR models are summarized in Table 2, and the correlation matrix for the variables used in the study is shown in Table 3.

Table 1 Data sets used in this study
Table 2 Ranges of the data used to develop the SVR and NN models
Table 3 Correlations among the variables used in this study

2.2 Neural network modeling

The collected experimental data were used to train and test two NN (MLP and RBF) for predicting responses of broiler chickens to dietary levels of protein and BCAA. The 241 data lines assembled were shuffled, and 120 were used for the learning process and 121 for testing. The data set was imported into the Statistica Neural Networks software version 8 [12]. The data lines were the same as those used to develop the SVR models. ADG, FE, and FI were considered as dependent variables and dietary protein, valine, isoleucine, and leucine as independent. A common problem in NN training is over-fitting [13]. A network with a large number of weights compared to number of training cases available may achieve a low training error simply through modeling a function that fits the training data well. To circumvent this issue, the optimum architecture (i.e., the number of hidden neurons in the network and training algorithm) was determined using the algorithms integrated within in the ‘intelligent problem solver’ module of Statistica software [12]. A schematic diagram of the NN models developed in this study is shown in Fig. 1a.

Fig. 1
figure 1

Schematic diagram of the models used in this study: a NN and b SVR

2.3 Support vector regression

Here a brief description of SVR is given, following [14]. Like most linear regression procedures, the SVR algorithm, developed by [8], is based on estimating a linear regression function:

$$f({\mathbf{x}}) = {\mathbf{w}}^{\text{T}} {\mathbf{x}} + b\;\left( {{\mathbf{w}},\,{\mathbf{x}} \in R^{d} (d\hbox{-}{\text{dimensional}}\,{\text{input}}\,{\text{space}})} \right)$$
(1)

where w and b are the slope and offset of the regression line, respectively. The regression function (Eq. 1) is calculated by minimizing:

$$\frac{1}{2}{\mathbf{w}}^{\text{T}} {\mathbf{w}} + \frac{1}{n}\sum\limits_{i = 1}^{n} {c(f({\mathbf{x}}_{{\mathbf{i}}} ),y_{i} )}$$
(2)

where \(\frac{1}{2}\left\| {\mathbf{w}} \right\|^{2}\) is the term characterizing model complexity (i.e., smoothness of \(f({\mathbf{x}})\), where \(\left\| \ldots \right\|\) denotes vector length), and \(c(f({\mathbf{x}}_{i} ),y_{i} )\)) is the loss function which determines how the distance between \(f({\mathbf{x}}_{{\mathbf{i}}} )\) and the target value \(y_{i}\) should be penalized. In this primal formulation, several different loss functions are available, but in this paper, we adopt the commonly used \(\varepsilon\)-insensitive loss function introduced by [7]. This loss function is given by:

$$c(f({\mathbf{x}}_{{\mathbf{i}}} ),y_{i} ) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {{\text{if}}\;\left| {y_{i} - f({\mathbf{x}}_{{\mathbf{i}}} )} \right| \le \varepsilon } \hfill \\ {\left| {y_{i} - f({\mathbf{x}}_{{\mathbf{i}}} )} \right| - \varepsilon ,} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.$$
(3)

Equation 3 defines a tube of radius \(\varepsilon\) around the hypothetical regression function such that if a data point lies within this tube the loss function equals zero, while if a data point lies outside the tube, the loss is equal to the distance between the data point and the radius \(\varepsilon\) of the tube.

In this particular case, minimizing Eq. 2 is equivalent to solving the goal programming problem [14, 15]:

$$\begin{aligned} & {\text{minimize }}\frac{1}{2}{\mathbf{w}}^{\text{T}} {\mathbf{w}} + C\sum\limits_{i = 1}^{n} {(\xi_{i} + \xi_{i}^{*} )} \\ &{\text{subject to }}\left\{\begin{array}{l} y_{i} - {\mathbf{w}}^{\text{T}} {\mathbf{x}}_{{\mathbf{i}}} - b \le \varepsilon + \xi_{i} \\ {\mathbf{w}}^{\text{T}} {\mathbf{x}}_{{\mathbf{i}}} + b - y_{i} \le \varepsilon + \xi_{i}^{*} \\ \varepsilon ,\xi_{i} ,\xi_{i}^{*} \ge 0 \hfill \\ \end{array} \right. \\ \end{aligned}$$
(4)

where \(\xi_{i}\) and \(\xi_{i}^{*}\) are slack variables and the constant \(C > 0\) determines the trade-off between model complexity and tolerance of deviations larger than \(\varepsilon\). The goal programming problem (Eq. 4) can be expressed in its dual form using Lagrange multipliers [16]. In this paper, we use the strategy outlined by [7] leading to the solution:

$$f({\mathbf{x}}) = \sum\limits_{i = 1}^{n} {(\lambda_{i} - \lambda_{i}^{*} )K({\mathbf{x}}_{{\mathbf{i}}} {\mathbf{,x}}) + b}$$
(5)

where \(\lambda_{i} \,{\text{and}}\,\lambda_{i}^{*} \;(0 \le \lambda_{i} ,\lambda_{i}^{*} \ge C)\) are the Lagrange multipliers and \(K({\mathbf{x}}_{{\mathbf{i}}} {\mathbf{,x}})\) represents the kernel function [17]. In the context of Eq. 5, data points with nonzero \(\lambda_{i}\) and \(\lambda_{i}^{*}\) values are support vectors. A suitable kernel function makes it possible to map a nonlinear input space to a high-dimensional feature space where linear regression can be performed [7]. Several kernel functions have been proposed in literature. In this paper, different kernel functions were used, viz. linear, RBF, polynomial (second and third order), and sigmoidal. The kernel parameters must be selected appropriately by the user, as generalization performance of the SVR model depends heavily on correct setting of these parameters. For a more detailed description of kernel functions and parameters, the reader is referred to [18]. A schematic diagram of SVR models used in this study is shown in Fig. 1b.

2.3.1 Support vector regression training

SVR was implemented in the Statistica software [12]. The SVR parameters were obtained following a tenfold cross-validation experiment. This algorithm is a way to improve on the holdout method which is the simplest variant of k-fold cross-validation [19]. The data set is divided into k subsets, and the holdout method is repeated k times. Each time, one of these subsets is used as the test set while the remaining subsets are assembled to form a training set. Then the average error across all k trials is computed. The advantage of this method is that it matters less how the data are divided. Every data point gets to be in a test set exactly once and gets to be in a training set k − 1 times. The variance of the estimate is reduced as k is increased. The disadvantage of this approach is that the training algorithm has to be rerun from scratch k times, which means it requires k times as much computation to make an evaluation. A variant of this method is to randomly divide the data into a test and training set k different times [20].

2.4 Performance evaluation

Goodness of fit of the NN and SVR models was based on coefficient of determination (R 2), mean square error (MSE), mean absolute error (MAE), and bias [21]:

$$R^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {(x_{pi} - x_{ei} )^{2} } }}{{\sum\nolimits_{i = 1}^{n} {(x_{pi} - \bar{x})}^{2} }}$$
$${\text{MSE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {(x_{pi} - x_{ei} )^{2} }$$
$${\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {x_{pi} - x_{ei} } \right|}$$
$${\text{Bias}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {(x_{pi} - x_{ei} )}$$

where \(x_{pi}\) is the predicted output for observation \(i\), \(x_{ei}\) is the experimental output for observation \(i\), \(\bar{x}\) is the average value of the experimental output, \(\left| \ldots \right|\) denotes modulus, and \(n\) is the total number of observations.

3 Results and discussion

The ability of NN models to predict the response of broiler chickens to different dietary nutrients and to predict the energy values of concentrate feedstuffs have been demonstrated previously [2225]. However, little is known about the suitability or otherwise of SVR models in animal (especially poultry) nutrition. An attempt in this area was a study by [23] of the ability of SVR models to estimate carcass characteristics of broilers influenced by dietary nutrient intakes. In that study, SVR models were developed using only the RBF kernel function and the results showed that support vectors are worth considering as a possible alternative to NN models. As several indices can be used for model evaluation, it is difficult to identify an optimum model that satisfies, at best, every single evaluation method. Therefore, emphasis in the present study was given to the accuracy of models developed in relation to the aforementioned criteria.

3.1 Average daily gain

Kernel parameters for predicting ADG of the broiler chickens in response to protein and BCAA are summarized in Table 4. These values may be considered as initial values when developing future SVR models. The optimal architecture for the MLP-type NN for modeling ADG was found with 4 inputs, 1 output (with linear activation function), and 4 hidden neurons (with the hyperbolic tangent activation function), using a quasi-Newton algorithm [26] for network training. The optimal structure for the RBF-type NN model was found with 4 inputs, 1 output, and 11 hidden neurons. It seems that all the models developed are able to predict ADG of broiler chickens satisfactorily. Scatter plots of predicted versus actual and residual versus predicted values of ADG obtained with the MLP-type NN and linear SVR models are shown in Fig. 2. The MLP-type NN and linear kernel function were considered the best and worst models, respectively, in predicting the broiler chickens’ ADG. The regression equations and corresponding low R 2 values for the distribution of residual versus predicted values suggest there is little or no evidence of prediction bias.

Table 4 Kernel function parameters \((C, \, \varepsilon , { }\gamma )\) for the SVR models developed to predict average daily gain
Fig. 2
figure 2

Scatter plots of predicted versus actual (top row) and residual versus predicted (bottom row) values of ADG obtained by best (MLP-type NN) and worst (linear SVR) models developed (all 241 data lines represented)

Statistics for assessing the performance of each model are shown in Table 5. Based on these criteria, the MLP-type NN models showed a higher coefficient of determination compared to the SVR models. Among SVR models developed with different kernels, best and worst prediction was achieved using the polynomial (third order) and the linear kernel functions, respectively. The performance of SVR models with RBF and polynomial (third order) was better than that of RBF-type NN models (Table 5). Our results showed that the MLP-type NN model is more accurate for predicting ADG than the RBF type. The results revealed good agreement between observed and predicted values of ADG for the training and testing sets. A well-trained model gives balanced values of the evaluation statistics for these two sets, suggesting that over-fitting has not occurred during the training process [24]. In agreement with our results, previous studies have shown the superiority of NN to SVR models in predicting the investigated output variables [27].

Table 5 Accuracy of the SVR and NN models developed to predict average daily gain

3.2 Feed efficiency

In general, feed costs account for two-thirds of end-table costs in poultry production [28]. In birds, FE can be calculated as gram of gain per gram of feed intake and adequate knowledge and prediction of FE can therefore help provide economic benefits [29].

The kernel parameters to estimate FE are summarized in Table 6. The optimal architecture of the MLP- and RBF-type NN for modeling FE, suggested by intelligent problem solver, was found to be similar to that of the models for ADG. The MLP and RBF models were developed with 4 input variables and had 4 and 11 neurons in the hidden layer, respectively. Scatter plots of predicted versus actual and residual versus predicted values of FE obtained with the MLP-type NN and sigmoidal SVR models are shown in Fig. 3. The regression equations and corresponding low R 2 values for the distribution of residual versus predicted values suggest there is no evidence of prediction bias. As this figure illustrates, the model with the higher R 2 value for the distribution of observed versus predicted FE had the lower R 2 value for the distribution of residual versus predicted values, and vice versa.

Table 6 Kernel function parameters \((C, \, \varepsilon , { }\gamma )\) for the SVR models developed to predict feed efficiency
Fig. 3
figure 3

Scatter plots of predicted versus actual (top row) and residual versus predicted (bottom row) values of feed efficiency (FE) obtained by best (MLP-type NN) and worst (sigmoidal SVR) models developed (all 241 data lines represented)

The statistics used to assess performance of the models are shown in Table 7, and these indices indicate similar trends in goodness of fit across models. Overall performance appears better for FE than for ADG (Tables 5, 7). For the training set, best performance was achieved with the MLP-type NN model (R 2 = 0.87) followed by SVR with RBF and third-order polynomial kernel functions (R 2 = 0.85), second-order polynomial SVR- and RBF-type NN (R 2 = 0.82), linear SVR (R 2 = 0.76), and sigmoidal SVR (R 2 = 0.71), respectively. For the testing set, highest accuracy was achieved for MLP-type NN and SVR models with RBF and third-order polynomial kernel functions (R 2 = 0.87). Accuracy in prediction, as well as the flexibility of constructed models, affirms the strong effect of the selected dietary nutrients (protein, valine, isoleucine, and leucine) on FE in broiler chickens [30, 31]. In other words, the statistical analyses indicated that it is possible to use dietary protein and BCAA to predict the FE of broilers accurately.

Table 7 Accuracy of the SVR and NN models developed to predict feed efficiency

3.3 Feed intake

Lehninger [32] classifies leucine as a ketogenic amino acid as it yields the ketone body acetoacetate. Valine is a glycogenic amino acid as its degradation yields succinate, an intermediate in the TCA cycle. Isoleucine is classified as both a ketogenic and glycogenic amino acid as it is degraded in the body to produce acetyl-COA and succinate. Ketone bodies act as an inhibitory signal in both the central and peripheral nervous systems for feed intake of poultry [33]. Moreover, [34] demonstrates that excess leucine severely depresses feed consumption and weight gain of chicks fed a low protein diet. For these reasons, it was assumed that FI of broilers could be predicted in response to different levels of dietary protein and BCAA. However, results showed that overall performance of the models was not as good as for those developed to predict ADG and FE. Our selected input variables (protein and BCAA) only covered a maximum 0.68 of FI variation (R 2 for MLP-type NN). The kernel parameters for predicting FE are summarized in Table 8. The MLP- and RBF-type NN models developed were composed of 5 and 11 hidden neurons, respectively. For the FI models, as for ADG and FE, best performance was achieved with MLP-type NN and SVR with a RBF kernel function. The prediction ability of all SVR models, except for the sigmoidal kernel function, was higher than for the RBF-type NN model. The behavior of the MLP-type NN (as the best fitting) and sigmoidal SVR (as the worst fitting) models is shown in Table 9. Figure 4 shows scatter plots of predicted versus actual and residual versus predicted values of FI obtained with the MLP-type NN and sigmoidal SVR models. With the FI models, as with the FE models, the model with the higher R 2 value for distribution of observed versus predicted FE provided the lower R 2 value for distribution of predicted versus residual values.

Table 8 Kernel function parameters \((C, \, \varepsilon , { }\gamma )\) for the SVR models developed to predict feed intake
Table 9 Accuracy of the SVR and NN models developed to predict feed intake
Fig. 4
figure 4

Scatter plots of predicted versus actual (top row) and residual versus predicted (bottom row) values of feed intake (FI) obtained by best (MLP-type NN) and worst (sigmoidal SVR) models developed (all 241 data lines represented)

4 Conclusions

In this paper, we suggest SVR as a means of estimating the response of broilers to dietary protein and BCAA. The SVR approach was compared to MLP- and RBF-type NN models. Based on the results of this study, we conclude that it is feasible to apply SVR and NN models to predict the performance response of broiler chickens (in terms of ADG, FE, and FI) to dietary protein and BCAA. Models derived using these two approaches (SVR and NN) offer an alternative to those obtained from the standard statistical meta-analysis employed in animal nutrition [42, 43] and might well provide general models for common usage. Further research, however, is needed to explore these assertions. MLP-type NN models appear to provide a better option than SVR to estimate the output variables investigated. Among the different kernel functions applied here, the RBF and polynomial (third order) functions gave better performance than other SVR kernel functions and RBF-type NN models. However, there remains a paucity of literature information on application of SVR in poultry nutrition and more studies are needed to examine further the use of these models.