1 Introduction

1.1 Background

There has been a renewed research interest on biological hydrogen production because of the growing global environmental concerns regarding depletion of fossil fuel and expected drastic environmental condition in coming future. Hydrogen is considered as promising, alternative to fossil fuel and clean energy carrier without any emission of carbon dioxide or hazardous material on burning in contrast to other conventional fuels. Producing H2 from renewable feedstock could potentially alleviate many environmental, social and political problems associated with using fossil fuels [6, 34, 35].

Several processes may be applied to produce hydrogen including electrolysis of water, thermocatalytic reformation of hydrogen (steam reforming)-rich organic compounds and biological processes [8]. Fermentative hydrogen production can play the dual role of waste reduction and energy production using organic wastes as substrate [9, 29, 34, 35]. Many factors such as temperatures, initial pH and substrate concentrations can influence the fermentative hydrogen production, as these factors can affect the activity of hydrogen-producing bacteria by influencing the activity of some essential enzymes such as hydrogenases for fermentative hydrogen production [18, 31].

1.2 Problem statement

Recently, a traditional model had been developed to predict the hydrogen yield utilizing Box–Wilson design (BWD) approach by [2]. In fact, BWD approach could provide acceptable level of accuracy for predicting the hydrogen yield; however, BWD approach has some limitations. Such approach could be applied for limited records of the available data set due to its mathematical procedure. In addition, it was reported that BWD still introduced relatively poor accuracy in peak values of the hydrogen yield. As a result, it is still required to investigate other methods that could be provide robust model that mimic the pattern and generalized the hydrogen yield with respect to the input pattern.

A lot of systematic approaches have been introduced to facilitate the investigations regarding influence of parameters effecting production yield. Significant progress in the field of nonlinear pattern recognition and system control theory has made advances in a branch nonlinear system theoretic modeling called artificial neural network (ANN). An ANN is a nonlinear mathematical structure, which is capable of the representing arbitrarily complex nonlinear processes that relate the inputs and outputs of any system. ANN models have been used successfully to model complex nonlinear input–output time series relationships in a wide variety of fields [20].

Over-fitting has often been addressed using techniques such as weight decay, weight elimination and early stopping to control over-fitting (Weigend et al. 1992). Among these methods, early stopping is the most well-known solution (Prechelt 1998). However, using this method on time series of complex systems’ behavior stops the training process too early and the chance of detecting meaningful relations between the network outputs and actual behavior of the complex system does decrease. This indicates that the resulting model will not have proper features for predicting the time series of the system’s behavior. In the proposed method, we do not focus on removing the over-fitting problem for a single neural network. Instead, the major effort is to find an algorithm which is applied on the outputs of the over-fitted networks to produce the correct results. This algorithm will be presented in the following sections.

1.3 Objective

The main objective of this study is to propose the ANN technique as a prediction model for hydrogen production, especially from Clostridium saccharoperbutylacetonicum N1-4 (ATCC 13564; CSN1-4) using dark fermentation. The model architecture is based on varying variables, such as initial glucose concentration, initial medium pH and reaction temperature. Furthermore, a regularization method has been introduced to the ANN model in order to consider vast pattern of the hydrogen yield. Finally, for further verification and evaluation of the ANN performance, comparison analysis has been carried out over the BWD approach.

2 Materials and methods

2.1 Experimental setup and data collection

CSN1-4 culture stock was obtained from a culture collection maintained at the Chemical Engineering Department, UKM and reported previously by [2, 17]. A solution of 15 % PG medium per liter of distilled water was used as a growth medium for the inoculum. This medium was incubated in boiling water for 1 h and then filtered through cotton cloth. The filtrate was sterilized in an autoclave at 121 °C for 15 min. TYA medium was used for the preculture and main culture, and the composition of this medium per liter of distilled water was 40 g glucose, 2 g yeast extract, 6 g Bacto-Tryptone, 3 g ammonium acetate; 10 mg FeSO4·7H2O, 0.5 g KH2PO4 and 0.3 g MgSO4·7H2O per liter of distilled water [3, 4, 30]. The experimental methods reported in this work were adapted from earlier studies published in the literature [3, 4, 37], Miller 1959).

2.2 Box–Wilson design (BWD)

Box–Wilson design is a statistical technique to investigate the impact of the experimental variables on the response (output) that use central composite design (CCD) to use a response surface design, which are commonly chosen for the purpose of response optimization. BWD always based on Newton statistical method and also depend on the numbers of the variables [10, 22]. BWD technique can effectively be used with three variables to relate the response output and variable’s inputs. Equation 3 was used to fit the experimental data of hydrogen yield to construct the model, and it was used by [5] who reported that the BWD model was used to relate the response and three variable inputs.

$$ y = a_{0} + a_{1} T + a_{2} {\text{pH}} + a_{3} S + a_{4} T*{\text{pH}} + a_{5} T*S + a_{6} S*{\text{pH}} + a_{7} T^{2} + a_{8} {\text{pH}}^{2} + a_{9} S^{2} $$
(1)

where the terms (a 0a 9) in the above model are coefficients of the multiple regression analysis. Many reports employed BWD model to evaluate the effects of variables on the response [5, 32]. The values of these coefficients and statistically insignificant terms for the model which representing the suitable form of the mathematical model that relates the hydrogen yield (y) to the three variables in terms of levels are listed in Table 1.

Table 1 Regression coefficients of the response surface model for hydrogen yield

2.3 Artificial neural networks

Artificial neural networks are densely interconnected processing units that utilize parallel computation algorithms. ANN is also known as connectionism, parallel distributed processing, neuron-computing, natural intelligent systems and machine learning algorithms. The basic advantage of ANN is that they can learn from representative examples without providing special programming modules to simulate special pattern in the data set [7]. This allows ANN to learn and adapt to a continuously changing environment [14, 24].

While ANN does not provide a closed form mathematical model for the problem, they do offer accurate models based on the learning procedure. Neural networks (NN) are composed of simple elements operating in parallel. These elements are inspired by biological nervous system, and the network functionality is determined by the connection between them. The NN can be trained to perform a particular function by tuning the values of the weights (connections) between these elements [11]. The training procedure of NN is performed so that a particular input leads to a certain target output as shown in Fig. 1.

Fig. 1
figure 1

Artificial neural networks model

2.4 Hydrogen yield prediction using ANN

The inputs to the network are fixed length successive sequence of its recent behavior. The inputs are used to predict the next time-step. The general behaviors of the complex system are saved in the layers of networks. In the prediction stage, the input data together with this overall total behavior are presented to the hidden layers. The output of the hidden layer becomes a well-conditioned result of total system behavior, and then, the prediction can be done after this stage (Fig. 2).

Fig. 2
figure 2

The neuron model

The input and the output layers of any network have numbers of neurons equal to the number of the inputs and outputs of the system, respectively. The layers between the input and the output layers are known as hidden layers. The number of neuron and hidden layers can be arbitrarily chosen and adjusted until the function can map the desired output. It has been proven that a network of two layers that utilizes a sigmoid and a linear transfer function in its first and second layer, respectively, can be trained to model any nonlinear relation [1113, 23].

In order to accelerate the training procedure and to achieve minimum mean square estimation error, the input data was normalized. Different MLP–ANN architectures (while keeping three neurons in the input layer and only one neuron in the output layer) were used to examine the best performance. The choice of the number of hidden layers and the number of neurons in each layer is based on two performance indices [21]. The first is the root mean square value of the prediction error, and the second is the value of the maximum error. The ANN-based architecture is employed in this study to provide the hydrogen yield as a response with different variables as presented in Fig. 3. Once the network weights and biases are initialized, during training process, the weights and biases of the network are iteratively adjusted to minimize the network performance function–mean square error (MSE)—the average squared error between the network outputs a and the target outputs t. Although the proposed ANN model reach the pre-defined performance goal while training, in case, we expand the training data set, the performance while testing showed relatively poor result, which means that the ANN model while training experienced over-fitting problem. In order to overcome the over-fitting problem and improve the model performance while testing that the model could be generalized for different data pattern, a regularization procedure is introduced to the ANN model.

Fig. 3
figure 3

The exact neural network architecture for three inputs

2.5 Neural network generalization

Network over-fitting is a classical machine learning problem that has been investigated by many researchers (Schaffer 1993; Stallard and Taylor 1999). Network over-fitting usually occurs when the network captures the internal local patterns of the training dataset rather than recognizing the global patterns of the datasets. The knowledge rule base that is extracted from the training dataset is therefore not general. As a consequence, it is important to recognize that the specification of the training samples is a critical factor in producing a neural network capable of making the correct responses. The problem of over-fitting has also been investigated by researchers with respect to network complexity [28] (Ooyen and Nienhuis 1992 and Livingstone 1997).

Here, to avoid an over-fitting problem, we utilized the regularization technique (Nordström and Svensson, 1992). This is known as a suitable technique when the scaled conjugate gradient descent method is adopted for training, as is the case in this study. The regularization technique involves modifying the performance function which is normally chosen to be the sum of squares of the network errors on the training set defined as follows:

$$ {\text{MSE}} = \frac{1}{2}\sum\limits_{P = 1}^{n} {(Y_{O} - Y_{P} )^{2} } $$
(2)

The modified performance function is defined by adding a term that consists of the mean of the sum of squares of the network weights and biases to the original MSE function as follows:

$$ {\text{MSE}}_{\text{reg}} = \gamma \; \times {\text{MSE}}\; + \;\left( {1 - \gamma } \right)\; \times \;{\text{MSW}} $$
(3)

where γ is the performance ratio that takes values between 0 and 1, and MSW is computed as follows:

$$ {\text{MSW}} = \frac{1}{M}\sum\limits_{j = 1}^{M} {w_{j}^{2} } $$
(4)

where M is the number of weights utilized inside the network structure and w is weight matrix of the network. Using the performance function of Eq. (10), the NN to predict the hydrogen production were developed with the intention to avoid over-fitting of data.

2.6 Model performance

There are varieties of performance function. The more using of performance functions for checking error the better accuracy are shown. Therefore, this study presented different methods for investigating model performance. These were mean absolute relative error (MARE), mean absolute error (MAE) and MSE. They evaluated models by the following equations:

$$ {\text{MARE}} = \frac{1}{N}\sum {\frac{{\left| {y - y^{\prime } } \right|}}{y}} *100 $$
(5)
$$ {\text{MAE}} = \frac{{\sum {\left| {y - y^{\prime } } \right|} }}{N}*100 $$
(6)
$$ {\text{MSE}} = \frac{{\sum {\left( {y - y^{\prime } } \right)^{2} } }}{N} $$
(7)

where, y is the experimental Hydrogen production, y′ predicated Hydrogen production by model and N is the total number of data. R 2 was resulted by linear correlation between actual and simulated data.

In addition, data training performances in this study were evaluated using Nash–Sutcliffe coefficient of efficiency (E ns ). The efficiency E ns is a widely used statistical index to describe forecasting accuracy of hydrological models. The E ns with value of one means a perfect fit between model and observed values. And practically, the E ns with a value of greater than 0.9 means a satisfactory performance results and a value greater than 0.95 means a good performance results between the two data series. The E ns for water level forecasting is defined as follows:

$$ E_{ns} = 1 - \frac{{\sum {\left( {y_{m} - y^{\prime } } \right)^{2} } }}{{\sum {\left( {y_{m} - y} \right)^{2} } }}. $$
(8)

3 Results and discussion

3.1 Results utilizing BWD approach

A nonlinear least-squares regression program based on Gauss–Newton method was used to fit model 5.6, and this fitting provides the predicted hydrogen yield (y), the residual error and the coefficients (a n ) of this equation and fitted response presented as model 4. This model was used to verify form 4 by using ten experimental runs as calibration and fitted the five experimental runs as validation. Figure 4 presented the prediction of hydrogen by BWD statistical with correlation coefficient (R 2 = 0.895).

$$ y = - 649.6 - 26.47T + 58.66{\text{pH}} + 4.99S - 0.32T_{c}^{2} - 3.25{\text{pH}}^{2} - 0.042S^{2}. $$
(9)
Fig. 4
figure 4

Hydrogen yield (%) predicted by BWD statistical

3.2 Results of the modeling abilities of ANN

The ANN model architecture demonstrated in Fig. 3 is employed to provide the numerical data of hydrogen yield from ten experiments associated with temp, pH and initial glucose concentration and was used to train the ANN model to achieve the MSE target successfully. The training curve for the proposed ANN architecture presented in Fig. 3 is demonstrated on Fig. 5 showing convergence to the target MSE of 0.0001 after 127 iterations. Results showed based on ANN model found high of accuracy and efficiency to achieve prediction error lower than to CCD that agreed to [36] who reported that root MSE and the standard error of prediction for the neural network model were much smaller than those for the response surface methodology mode. This indicting that the neural network model had a much higher modeling ability than the response surface methodology model (Nagata and Chu 2003).

Fig. 5
figure 5

Iteration curve for the process of hydrogen yield to 15 runs using ANN

Typically, many such input/target pairs are used to train a network. Backpropagation (BP) uses input vectors and corresponding target vectors to train an NN. The NN with a sigmoid and linear output layer are capable of approximating any function with a finite number of discontinuities [27]. The standard BP algorithm is a gradient descent algorithm, in which the network weights are changed along the negative of the gradient of the performance function [11, 26]. There are a number of variations of the basic BP algorithm, which are based on other optimization techniques, such as conjugate gradient and Newton methods. Model ANN technique has the ability to predict 60 experimental and more based on using multilayer perceptron (MLP–NN) as presented in Fig. 3.

Figure 6 shows the performance of the proposed ANN utilizing the same dataset presented for the BWD which is including 15 experiments. It can be depicted that ANN outperformed the BWD model and could provide higher accuracy and more consistent level of accuracy for hydrogen yield at the same conditions with correlation coefficient (R 2 = 0. 984) while standard error in BWD less than ANN. This result is almost equivalent to the result reported by [36], who was investigated the effect of temperature, initial pH and glucose concentration on fermentative hydrogen production by mixed cultures in batch test and found the neural network model much higher modeling.

Fig. 6
figure 6

Hydrogen yield (%) predicted by ANN

For further analysis, the prediction error distribution as a statistical index for model evaluation was used as follows:

$$ {\text{PE}} = \left( {\frac{{H_{\text{actual}} - H_{\text{predicted}} }}{{H_{\text{actual}} }}} \right)*100 $$
(10)

Figure 7a, b shows the error distribution for the model output during training Case #1, (exp# 1–10) and testing session (exp # 11–15), respectively. It can be observed from Fig. 7a that the ANN model could provide significant level of high accuracy with error lower than 6 %; on the other hand, higher levels of errors have been observed during testing session as presented in Fig. 7b because of the significant changes in the input pattern for the model. However, the ANN model still provides acceptance level of error lower than 10 % except in one case (exp # 13). This result showed that the neural network could be successfully describe the effects of temperature, initial pH and glucose concentration on the hydrogen yield and agreed to report by [34, 35].

Fig. 7
figure 7

ANN performance for Case #1, a training process of hydrogen yield using ANN, b testing process of hydrogen yield using ANN

For more verification, Table 2 shows a comparison for prediction of hydrogen yield using the MLP–NN model and the BWD model. While the MLP–NN model was able to reduce the error in predicting hydrogen yield to be less than the ±6 %, BWD model was not able to achieve similar level of accuracy. The performance of the MLPNN model in Table 2 columns (5, 6) showing only one case similarity (exp #4) and provides relatively lower accuracy for (exp # 8), whereas MLP–NN outperformed the BWD for predicted hydrogen yield in 13 experiments out of 15. As a result, with MLP–NN model is much advantageous rather than BWD to predict the hydrogen yield with utmost accuracy using number of variables and experimental patterns.

Table 2 Experimental design for constructing statistical and ANN models

For further assessment, the proposed ANN model was examined utilizing 60 records of hydrogen yield experiments, Case #2. In fact, it is important to evaluate the performance of the prediction model considering wide range of the stochastic pattern of the hydrogen yield. Therefore, the proposed ANN model architecture in Fig. 3 is re-arranged to consider total 60 records of hydrogen yield experiments, out of which 50 records were fixed as the training session and the last 10 records for the testing session. Figure 8a illustrates that the proposed ANN model could provide hydrogen yield prediction within error less than 5 % except in two cases (exp # 2 and exp # 38). On the other hand, during the testing session, the ANN model achieve prediction error lower than 20 % as shown in Fig. 8b; this is due to the highly stochastic pattern experienced in the data records (exp # 51 to exp # 60).

Fig. 8
figure 8

ANN model performance for Case #2, a training process of hydrogen yield using ANN, b testing process of hydrogen yield using ANN

To validate the functionality of the proposed ANN model with generalization, comparison analysis between the performances of different prediction models has been carried out. Table 3 shows MARE, MAE, MSE and E ns values of the hydrogen yield prediction utilizing MLP–NN without generalization, ANN with generalization and BWD models. Results from Table 3 shows that ANN with generalization model provides remarkable improvement over the other model for all evaluation indexes.

Table 3 MARE, MAE, MSE and E ns associated with the output of BWD and MLP–NN with and without generalization for hydrogen yield models

3.3 Further developments in ANN for predicting hydrogen yield

It is common in ANN development to train several different networks with different architecture and to select the best one on the basis of performance of the networks with testing/validation sets. A major disadvantage of such an approach is that it assumes that performance of the networks for all other possible testing sets will usually be similar, which is statistically incorrect. Moreover, observing the performance of the fifteen developed ANN models tested with the four testing sets, making it obvious that no single network has the optimal prediction for all the testing data sets. Therefore, a better accuracy than the best reported by any single network can be accomplished if an optimized algorithm could be developed to utilize all these networks.

Another interesting observation is that, the effect of the transfer function is as important as the number of layers and neurons in each layer. This can be observed when comparing the performance of two networks with similar number of hidden layers and neurons but with different transfer functions. Further discussion on the effect of the optimal combination of different transfer function for specific applications is beyond the scope of this study.

In general, the application of neural network in hydrogen yield is promising together with the proposed regularization procedure. However, the proposed ANN model approaches still lacking for appropriate method for searching the best architecture. In addition, preprocessing for the data is an essential step for time series forecasting model and required more survey and analysis that could lead to better accuracy in our application. The selection of key parameter set and components with ANN model and variable selection procedures (input pattern) in hydrogen yield were attempted in this study. However, the optimal selection of the key parameter still required to be achieved by augmenting the ANN model with other optimization model such as genetic algorithm or particle swarm optimization methods. On the other hand, the variable selection (input pattern) in ANN model is always a challenging task due to the complexity of the prediction modeling. Some other advanced ANN model, namely dynamic neural network (DNN) that consider the time-dependent interrelationship between the input and output pattern could be investigated and might provide better prediction model. The investigation and application of more robust input pattern selection approaches, such as systematic searching of optimal or near optimal variable combination in DNN with regularization procedure, would be desirable in prediction of hydrogen yield research studies.

4 Conclusion

In this research, ANNs substantiate to be successfully used to predict hydrogen yield using CSN1-4 with three variables: reaction temperature, initial medium pH and initial glucose concentration. The proposed ANN-based model reliably predicts hydrogen yield and could be used as a predictive controller for management and operation of large-scale hydrogen-fermenting systems. The neural network with its nonlinear architecture could provide significant level of accuracy to predict hydrogen yield under different stochastic pattern of temperature, initial pH and glucose concentration. The results showed that the proposed ANN model achieve consistent level of accuracy for hydrogen yield (HY) while training and testing stage for HY prediction within maximum error of (6 %) in case of utilizing 15 data records. Furthermore, a comparison analysis with a traditional approach BWD statistical has been introduced and shows that the ANN model output significantly outperformed the BWD. In addition, the results showed that the proposed ANN widens the range of the hydrogen yield prediction with the consideration of different level of stochastic pattern of the input up to 60 records. Consequently, the ANN overcomes the limitation of the BWD approach, which is merely considering limited length of stochastic pattern for hydrogen yield (15 records).