1 Introduction

Wastewater effluents from textile industries are characterized by high volumes and extremely variable composition, which can include biodegradable and non-biodegradable dyes, organic matter, salts and toxic substances (Isik and Sponza 2004). The variety of raw materials, chemicals, processes and also technologic variations applied to the processes cause complex and dynamic structure of environmental impact of textile industry (Sapci and Ustun 2003). These industries have shown a significant increase in the use of synthetic complex organic dyes as the colouring material. Therefore, the discharge of dye house wastewater into the environment is aesthetically displeasing, impedes light penetration, damages the quality of the receiving streams and may be toxic to treatment processes, to food chain organisms and to aquatic life (Talarposhti et al. 2001).

With environmental regulations becoming more stringent, regulatory compliance has become a matter of increasing concern, and there is a need to conduct more effective process strategies. Better control of the process may be achieved by the use of a robust model to predict certain key parameters based on past observations. Because of their reliable, robust and salient characteristics in capturing the nonlinear relationships existing between variables (multi-input/output) in complex systems, it has become apparent that numerous applications of ANNs have been successfully conducted in various parts of environmental and water resources engineering fields.

A two-part study was undertaken to investigate the role of ANNs in hydrology (ASCE Task Committee 2000a, b). It was found that ANNs were robust tools for modeling many of the non-linear hydrologic processes such as rainfall-runoff, stream flow, ground-water management, water quality simulation, and precipitation. However, Maier and Dandy (2000) suggested that future research efforts should be directed towards the development of ANN models for representation of internal network parameters, choosing appropriate model inputs and stopping criteria, and optimizing network geometry adequately. For instance, Oliveria-Esquerre et al. (2002) used the quickprop (QP) and delta-bar-delta (DBD) methods in determining the best ANN internal representation, and determined the stopping criteria based on the mean square error (MSE). The authors also reported that the PCA technique helped the nonlinear ANN mapping by its orthogonal transformation of variables and reduction of system dimensionality. Mohanty et al. (2002) demonstrated an ANN simulation of the performance of a biological activated-carbon filter. They concluded that the ANN model could reasonably estimate the COD reduction in an exhausted fitler, and gave much better results than a second-order polynomial regression model.

Cigizoglu (2003a) investigated the applicability of ANNs to forecasting, estimation and extrapolation of the daily flow data. The study resulted that an ANN solution could provide a tighter fit to the data than conventional models. Another study was undertaken by Cigizoglu (2003b) to apply autoregressive moving average (ARMA) models into flow forecasting using ANNs. The study concluded that the extension of input and output data sets in the training stage improved the accuracy of forecasting using ANNs. Cigizoglu (2004) investigated the performance of MLPs in daily suspended sediment estimation and forecasting. The author concluded that MLPs captured the complex non-linear behaviour of the sediment series relatively better than the conventional models. Another study was conducted by (Cigizoglu 2005a) on the generalized regression neural network (GRNN), for intermittent river flow forecasting and estimation. It was reported that the GRNN simulations did not face the frequently encountered local minima problem in feed forward back propagation method (FFBP) applications, and GRNNs did not generate forecasts or estimates that were not physically plausible. Similarly, Cigizoglu (2005b) concluded that the GRNN approach did not require an iterative training procedure, unlike the FFBP method and GRNN forecasting performance was found to be superior to the FFBP, statistical, and stochastic methods in terms of the selected performance criteria.

Onkal-Engin et al. (2005) used an ANN trained with a BP algorithm to determine the relationship between sewage sample odours and their related Biochemical Oxygen Demand (BOD) values. They concluded that ANNs could be successfully used to classify the sewage samples collected from different locations of a WWTP. Daneshvar et al. (2006) developed an ANN model to predict the performance of decolorization efficiency by electrocoagulation process. They concluded that the ANN model could describe the behaviour of the complex reaction system with the range of experimental conditions adopted. Grieu et al. (2006) studied Kohonen’s self-organizing maps (KSOM) and MLP neural networks for on-line estimating the efficiency of an activated sludge process. The authors concluded that KSOM and MLP were proven to be efficient and complementary in charge of the plant monitoring and maintenance. Another study was carried out by Molga et al. (2006) to model a full-scale industrial WWTP using ANNs. The study resulted that sufficiently good accuracy of the dynamic behaviour predictions were obtained with experimental data for the considered plant. Cigizoglu and Kisi (2006) studied on methods to improve the neural network performance in suspended sediment estimation. They concluded that the range-dependent neural network (RDNN) method was found to be superior to conventional ANN applications, where only a single network was trained considering the entire training data set.

Perendeci et al. (2007) proposed a conceptual neural-fuzzy model based on adaptive-network-based fuzzy inference system (ANFIS) to estimate effluent Chemical Oxygen Demand (COD) of a full-scale anaerobic wastewater treatment plant for a sugar factory operating at unsteady state. They concluded that the correlation coefficient between estimated and measured values of output variable could be increased to the value of 0.8940, which was considered a good fit. Alp and Cigizoglu (2007) estimated the daily total suspended sediment load on rivers using two different ANN algorithms, the FFBP method and the radial basis functions (RBF). They concluded that the simulations provided satisfactory results in terms of the selected performance criteria comparing well with conventional multi-linear regression. Finally, Ráduly et al. (2007) carried out studies on ANNs for rapid WWTP performance evaluation. With correlation coefficients higher than 0.95 and prediction errors lower than 10%, the study resulted that the accuracy of the ANN was sufficient for applications in simulation-based WWTP design and simulation of integrated urban wastewater systems.

Although deterministic models (also called as white-box models) give a good insight into the mechanism, they require a lot of hard work before applying to a specific wastewater treatment plant. Because kinetic parameters and wastewater characteristics can show some fluctuations in different periods of time when the operating conditions are applied on a regional scale, calibration of these models are extremely time consuming, laborious, and needs extensive laboratory and computer work. However, ANNs provide a speedy and practical manner for the control engineer to make a modification and/or a supplement in the considered process and also to develop an effective continuous monitoring with regard to the discharge standards. In addition, calibration of ANN models is easier than the white-box models as there are fewer parameters used in the model development process. When the measured variables start showing difference with the response of ANN, the model can be re-trained using the newer data used for cross checking. This process can be automated by embedding the ANN model in an expert system that controls the complete system. Consequently, potential advantages and benefits of the ANN technique have been highlighted by many substantial research activities above. For these reasons, an ANN model was proposed in estimation of COD removal efficiency (CODRE) of up-flow anaerobic sludge blanket (UASB) reactors treating diluted real textile wastewater in this study.

In this work, CODRE of UASB reactors being output parameter of the considered process was predicted by nine input variables such as hydraulic retention time (HRT), pH, influent COD concentration (CODi), operating temperature, alkalinity, volatile fatty acids (VFA) concentration, dilution ratio, organic loading rate (OLR) and total suspended solids (TSS) concentration using a three-layer ANN model. In addition to the ANN approach, a linear-nonlinear study was also carried out to investigate the effects of each input variable on CODRE values. Regression variable results including standard error, the t-statistics and the corresponding P values for each parameter were also presented. Finally, results obtained from stochastic modeling approaches proposed in this study were compared and advantages and further developments were discussed.

2 Materials and methods

2.1 Experimental set-up and operation

Three UASB reactors having a total volume of 1.2 l were operated for 80 days at mesophilic conditions (36–37.5°C) in a temperature-controlled water bath (Clifton) with two hydraulic HRTs of 4.5 and 9.0 days. Imposed volumetric OLR ranged from 0.072 and 0.602 kg COD/m3/day. Stability of the treatment process and components of wastewater samples were monitored in Environmental Engineering Laboratory at Yildiz Technical University. A detailed schematic of the experimental set-up is illustrated in Fig. 1.

Fig. 1
figure 1

Detailed schematic of the experimental set-up

The reactors were inoculated with granular biomass (25% of the working volume) obtained from Tekel Brewery Inc. (Istanbul, Turkey). The reactors then were filled to their respective volumes with textile wastewater (61% of the total volume). After the start-up period, the real textile wastewater obtained from the effluent of a cotton textile house in Istanbul, Turkey fed to the reactors with domestic wastewater. Five different dilution ratios of 15, 30, 40, 45 and 60% with domestic wastewater were employed, respectively. Characteristics of studied real cotton textile wastewater are given in Table 1.

Table 1 Characteristics of real cotton textile wastewater

The study was undertaken in a pH range of 6.20–8.06 and an alkalinity range of 1,350–1,855 mg/l CaCO3. During the experimental study, the pH and alkalinity were controlled in optimal process conditions, as suggested in the literature (Ozturk 1999). Depending on the feed characteristics, pH of the influent wastewater was adjusted by the gradual addition of 1 N H2SO4 and 1 N NaOH reagents (Merck Chemical Corp.). The concentrations of VFA and TSS ranged between 420 and 720 mg/l CH3COOH and 68–338 mg/l, respectively. Imposed CODi concentrations ranged from 651 to 4,044 mg/l. The variation in the feeding COD concentrations are due to fluctuations in domestic wastewater characteristics, and also different dilution ratios applied in experiments.

In feeding, different target HRTs were achieved using a peristaltic pump (Watson–Marlow 505 L, QC Approved). Gas was collected from the headspace (14% of the total volume) on the top of the reactors and gas production was measured by the liquid displacement method. The gas collection system consisted of a gas collecting pipe, a gas bag, a gas collecting tube and a measuring tube.

Six anaerobic reactors having a total volume of 200 ml were also operated to determine COD fractions of wastewater samples. These reactors were conducted for about 75 days at mesophilic conditions (36–37.5°C) maintained by an adjustable aquarium heater with thermostat (Otto Aquarium Company, Taiwan). Each of six reactors was seeded with 30 mg/l of acclimated granular sludge and homogenized with 100 ml of textile wastewater. Results obtained from inert COD study are summarised in Table 2 (Sapci 2002).

Table 2 COD fractions of real cotton textile wastewater and domestic wastewater

2.2 Description of the input parameters and the output parameter

HRT is a measure of the amount of time the digester liquid remains in the digester. Hydraulic retention time is crucial because if the feed does not stay in the reactor long enough for the entire treatment process to take place, biogas will not be produced (Garcelon and Clark 2000).

Anaerobic bacteria, specially the methanogens, are sensitive to the acid concentration within the digester and their growth can be inhibited by acidic conditions. Because the methane step is the rate-limiting step, pH should be kept near to 7. The optimal pH for bacterial growth of anaerobic organisms is in the range of 6.5–8.2 and consequently for a rapid sludge granulation the reactor pH should be maintained at this range (Ozturk 1999).

Influent COD concentration is used as a measure of organic strength of feed. COD of the wastewater is the measured amount of oxygen needed to chemically oxidize the organics present (Kinson et al. 2001).

There are mainly two temperature (mesophilic and thermophilic) ranges that provide optimum treatment conditions for an effective COD removal and methane production in anaerobic treatment. Temperature affects the activity and the growth of microorganisms. Most of the experiments carried out so far were conducted at 30°C, but it is well known that the optimal temperature for mesophilic growth is situated near 40°C (Amatya 1996).

The pH in anaerobic system is controlled by the interaction of the carbon dioxide/bicarbonate buffer system and a net strong base which is the summation of all strong acids and bases including volatile fatty acids and ammonia. In case of pH decrease, it can be maintained in the optimum range by the buffering effect of alkalinity (Amatya 1996).

In anaerobic treatment, change in VFA concentration is the most sensitive parameter, the reason being that the primary cause of digester failure hinges around imbalance between acidogenic, acetogenic and methanogenic organisms (Lahav and Loewenthal 2000). When the concentration of undissociated VFA remains high for prolonged periods, methanogens are slowly wiped out and acetogens predominate in the biorectors. Hence, the controlling of VFA accumulation is inevitable to obtain an effective COD removal and methane production.

Because textile industries have seasonal fluctuations in human activities, the ratio of domestic effluents mixing into the textile wastewater shows diversity. Hence, different dilution ratios of 15, 30, 40, 45 and 60% with domestic wastewater were employed to represent seasonal fluctuations in this study.

OLR is an important parameter significantly affecting microbial ecology and characteristics of UASB systems. This parameter integrates reactor characteristics, operational characteristics, and bacterial mass and activity into the volume of media (Torkian et al. 2003).

Suspended solids are present in sanitary wastewater and many types of industrial wastewater. As levels of suspended solids increase, a water body begins to lose its ability to support a diversity of aquatic life. In this study, TSS concentration was selected as a control parameter on effluent quality requirements.

CODRE of UASB reactors being output parameter was considered as a measure of treatment performance. CODRE value is defined as follows:

$$ {\text{CODRE }}(\% ) = {\left( {\frac{{{\text{COD}}_{{\text{i}}} - {\text{COD}}_{{\text{e}}} }} {{{\text{COD}}_{{\text{i}}} }}} \right)} \times 100 $$
(1)

where CODi and CODe are the influent and effluent COD concentrations, respectively.

2.3 Analytical methods

Input parameters used in the proposed models such as pH, CODi, Alkalinity, VFA and TSS of wastewater samples were determined by the procedures described in Standard Methods (APHA-AWWA 1995). These parameters were determined by Open Reflux Method for COD, Distillation Method for VFA, Gravimetric Method for TSS, and Titration Method for Alkalinity, respectively. The pH was measured by Jenway 3040 Ion Analyser. Concentrations of other components given in Table 1 were analyzed by Direct Air Acetylene Flame Method for heavy metals, Macro–Kjeldahl Method for TKN, Persulfate Digestion Method for TP, Visual Comparison Method for Color, Soxhlet Extraction Method for Oil and Grease, MBAS Method for detergents, and Gravimetric Method with drying of residue for Sulfate. During the study, the operational temperature of the reactors was detected by a temperature probe and monitored with a digital thermometer. Monitoring the performances of three reactors for an 80 days of continuous experimental study after the start-up period, 227 data points for each input variables were obtained in total. Variation of input parameters and CODRE values for 227 experimental observations are given in Fig. 2.

Fig. 2
figure 2

Variation of input parameters and CODRE values for 227 experimental observations

2.4 Definition of the neural network model

In this study, Neural Network Toolbox V4.0 of MATLAB® was used to predict CODRE values. A total of nine operating parameters which are effective in prediction of CODRE of UASB reactors, such as HRT (day), pH, influent COD concentration (mg/l), operating temperature (°C), alkalinity (mg/l CaCO3), VFA concentration (mg/l CH3COOH), dilution ratio (%), OLR (kg COD/m3/day) and TSS (mg/l) concentration were selected as the ANN model inputs [p] and CODRE values obtained from the experimental study were taken as the output [t] (Table 4). The ANN model used back-propagation (BP) algorithm to predict CODRE values based on input vector (9 × 227) of nine operating variables and target vector (1 × 227) obtained from the experimental study. Operating ranges of nine input variables considered in the ANN modeling and their descriptive statistics are given in Table 3. The data gathered from the experimental study was divided into input matrix [p] and target matrix [t].

Table 3 Data statistics of model variables considered in the ANN modeling

The first step, the data was loaded into the workspace. Original network inputs and targets given in the matrices [p] and [t] were normalized using an algorithm code, prestd. The normalized inputs and targets, pn and tn were gained zero means and unity standard deviation. The mean and standard deviations of the original inputs and targets were defined before the network had been trained. Moreover, the matrices containing the transformed input vectors and the principal component transformation matrix were defined, respectively. All these vectors were used in transformation for future inputs.

The next step, principal component analysis (PCA) was performed as an effective procedure for the determination of input parameters. In some situations, the dimension of the input vector is large, but the components of the vectors are highly correlated (redundant). It is useful in this situation to reduce the dimension of the input vectors. This technique (featuring an algorithm code, prepca, which performs PCA) eliminated principal components that made the least contribution (less than 0.1%) to variation in the data set, so it can be said that principal components which accounted for 99.9% of the variation were used. It was observed that there was a redundancy in the data set and size of the transformed data after the computation. The PCA analysis reduced the size of input vectors to eight input parameters. This step was followed by the division of the original data into training, validation and testing subsets.

One fourth of the the original data was taken for the validation set, one fourth for the testing set and one half for the training set. Hence, 114 input sets were used for the training. A total of 56 input sets and 57 inputs sets were used for the validation and testing, respectively. The experimental data was loaded into the workspace at random. The sets were picked as equally spaced points throughout the random original data. The training set enables to train the network on a representative set of input/target pairs and get good forecasting results. This property of the ANN provides to get a new output similar to the correct output. The function train carries out such a loop of calculation. In each pass, the function train proceeds through the specified sequence of inputs, calculating the output, error and network adjustment for each input vector in the sequence as the inputs are presented.

3 Results and discussion

3.1 Selection of backpropagation (BP) algorithm

It is very difficult to know which training algorithm will be the fastest for a given problem. It will depend on many factors, including the complexity of the problem and the number of data points in the training set (Yetilmezsoy and Saral 2007). In general, on networks, which contain up to a few hundred weights the Levenberg–Marquardt algorithm (LMA) will have the fastest convergence. This advantage is especially noticeable if very accurate training is required. The Quasi-Newton methods are often the next fastest algorithms on networks of moderate size. The BFGS algorithm is generally faster than the conjugate gradient algorithms. Of the conjugate gradient algorithms, the Powell–Beale procedure requires the most storage, but usually has the fastest convergence. The Polak–Ribiére has performance similar to the Powell–Beale. It is difficult to predict which algorithm will perform best on a given problem. The storage requirements for Polak-Ribiére (4 vectors) are slightly larger than for Fletcher–Reeves (3 vectors). The Fletcher-Reeves generally converges in fewer iterations than Rprop even though there is more computation required in each iteration. Rprop and the scaled conjugate gradient algorithm do not require a line search and have small storage requirements. They are reasonably fast, and are very useful for large problems. The variable learning rate algorithm is usually much slower than the other methods, and has about the same storage requirements as Rprop, but it can still be useful for some problems. The one step secant algorithm requires less storage and computation per epoch than the BFGS algorithm. It requires slightly more storage and computation per epoch than the conjugate gradient algorithms. It can be considered a compromise between full Quasi-Newton algorithms and conjugate gradient algorithms. In Batch Gradient methods, the weights and biases are updated in the direction of the negative gradient of the performance function.

A number of benchmark comparisons of the various training algorithms were performed in this study. 11 BP algorithms were compared to select the best fitting BP algorithm (Table 4). For all BP algorithms, a three-layer network with a tangent sigmoid transfer function (tansig) at hidden layer and a linear transfer function (purelin) at output layer, were used. Ten neurons were used at the hidden layer for all BP algorithms. Results of benchmark comparisons showed that the LMA, with a minimum mean squared error (MSE) was selected as the best of 11 BP algorithms. The comparison between BP algorithms is illustrated in Fig. 3.

Table 4 Comparison of 11 BP algorithms with ten neurons in the hidden layer
Fig. 3
figure 3

The comparison between backpropagation (BP) algorithms

3.2 Optimization of neural network structure

Almasri and Kaluarachchi (2005) reported that optimization of a neural network is an important task of neural network-based studies. This operation plays an important role in the performance of the network. Hence, an optimization was carried out between the neuron number and MSE. Then, the three-layer neural network were evaluated by the best BP algorithm selected as the best of 11 BP algorithms (Table 4).

In optimization of the network, two neurons were used in the hidden layer as an initial guess. With increasing of neuron number, the network gave several local minimum values and different MSE values were obtained for the training set. However, increasing neuron numbers to more than 12 caused an unrealistic result and mean squared error began to increase. Hence, the optimal neuron number for the LMA was found to be 12 neurons (MSE 0.0223/0). Figure 4 illustrates the dependence between the neuron number and MSE. The training stopped after 11 iterations (TRAINLM, Epoch 11/100) for the LMA because the differences between training error and validation error started to increase. Detailed descriptive statistics of the proposed ANN model subsets are given in Table 5. The optimal neural network structure, together with a flowchart of the BP algorithm, is shown in Fig. 5: a three-layer network, with tangent sigmoid transfer function (tansig) at hidden layer with 12 neurons and a linear transfer function (purelin) at output layer. Their mathematical definitions of tansig and purelin are given in Eqs. (2) and (3):

$$ f{\left( n \right)} = \frac{2} {{{\left( {1 + {\text{e}}^{{ - 2n}} } \right)}}} - 1 $$
(2)
$$ f{\left( n \right)} = n $$
(3)
Fig. 4
figure 4

The dependence between MSE and number of neurons at hidden layer for the LMA

Table 5 Detailed descriptive statistics of the proposed ANN model subsets
Fig. 5
figure 5

Optimal ANN structure, together with a flowchart of the BP algorithm for the prediction of CODRE

Although a linear transfer function (purelin) was used in the output layer, there was no negative ANN estimation present in ANN outputs. This can be attributed to the characteristics of the input vector used in this study. However, it is well-known that the linear output layer with a linear transfer function sush as purelin lets the network produce values outside the range −1 to +1. Therefore, negative ANN estimations may be expected depending on the characteristics of the input vector. On the other hand, if it is desirable to constrain the outputs of a network (such as between 0 and 1) then the output layer should use a sigmoid transfer function (such as logsig). Therefore, the linear activation function of the output neuron is replaced by a logarithmic sigmoid transfer function of Eq. (4) whose output is always positive (Huang 2005).

$$ f{\left( n \right)} = \frac{1} {{{\left( {1 + {\text{e}}^{{ - n}} } \right)}}} $$
(4)

In some cases, depending on the characteristics of data set, higher MSE values can be observed for an ANN structure with a sigmoid transfer function (logsig) at output layer compared to an ANN structure with a linear transfer function (purelin) at output layer. Because the optimal architecture of the ANN model and its parameter variation are determined based on the minimum value of the MSE, the second ANN structure is preferable even though the possible negative estimations do not match the actual outputs, which are always larger than zero. This time, negative estimations can be normally set to zero in practice. This was applied in some computational studies (Zhang 2004; López et al. 2007; Crujeiras and Fernández-Casal 2006). However, Cigizoglu et al. (2007) reported that generalized regression neural network (GRNN) method could be a good technique to avoid negative estimations even though it could provide some overestimations for some parts of the data. They observed that feed forward back propagation (FFBP), radial basis function (RBF) and multi linear regression (MLR) methods provided estimations having negative values for some of data. For the GRNN method, on the other hand, this problem was not seen by authors.

A regression analysis of the network response between the output and the corresponding target was performed. The linear regression between the network outputs and the corresponding targets showed that the neural network outputs (forecasted data) were obviously agreed with the experimental data. The correlation between ANN testing outputs and the experimental data is depicted in Fig. 6. The agreement between testing outputs and the experimental data is shown in Fig. 7. ANN testing outputs showed a very small deviation in CODRE with a maximum deviation of about 0.1218 from the experimental data (R 2 = 0.8245).

Fig. 6
figure 6

The correlation between ANN testing outputs and the experimental data

Fig. 7
figure 7

Agreement between ANN testing outputs and the experimental data

3.3 Linear-nonlinear study

In this section, the experimental data was also evaluated by DataFit® scientific software (version 8.1.69, Copyright © 1995–2005 Oakdale Engineering) and results were compared with neural network outputs and experimental values. The Levenberg–Marquardt method with double precision was performed in the linear-nonlinear study. As regression models are solved, they were sorted automatically according to the goodness of fit criteria.The experimental data was imported directly from Microsoft® Excel used as an open database connectivity data source and regression analysis was performed. Two linear models and one exponential model were obtained from the regression results. Results are summarized in Table 6.

Table 6 Summary of regression results

The coefficients (a, b, c, d, e, f, g, h, and i), constant term (j) and independent variables (x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 and x 9) in terms of original operating variables used in the best-fit model and regression variable results including standard error, the t-statistics and the corresponding P values for each parameter are given in Table 7. The best-fit model defined as a function of nine process variables [CODRE = f(HRT, CODi, pH, T, ALK, VFA, DR, OLR, TSS)] is given in Eq. (5).

$$ \begin{aligned}{} & {\rm CODRE} = {\left( {2.652 \times 10^{{ - 2}} } \right)}{\rm HRT} + {\left( {1.385\times 10^{{ - 4}}} \right)}{\rm COD}_{\rm i} + {\left( {8.167 \times 10^{{ - 2}} } \right)}{\rm pH} \\ & {\text{ }} + {\left( {0.116} \right)}{\rm T} + {\left( { - 1.797 \times 10^{{ - 4}} } \right)}{\rm ALK} \\ & {\text{ }} + {\left( { - 5.048 \times 10^{{ - 4}} } \right)}{\rm VFA} + {\left( {0.05214} \right)}{\rm DR} \\ & {\text{ }} + {\left( { - 0.294} \right)}{\rm OLR} + {\left( { - 5.63 \times 10^{{ - 4}} } \right)}{\rm TSS} - 4.362 \\ \end{aligned} $$
(5)
Table 7 Model components and regression variable results for the best-fit model

In linear-nonlinear study, the aim was to investigate the effects of nine input variables on CODRE values. Hence, a regression analysis was carried out. Results showed the significance of each variable. T-ratio represents the ratio of the estimated parameter value to the estimated parameter standard deviation. The larger ratio indicates the more significant parameter in the regression model. Moreover, the variable with the lowest P value is considered the most significant. Briefly, T-ratios and P values showed that the CODi concentration, HRT, operating temperature, TSS concentration and pH had more importance than the OLR, alkalinity and VFA concentration for this model structure.

Figure 8 shows the agreement between the regression model outputs and the experimental data for 227 observations. Regression model outputs showed a reasonable CODRE deviations with a maximum deviation of about 0.2845 from the experimental data (R 2 = 0.685). Residual errors the regression model outputs and the experimental data are shown in Fig. 9. To evaluate the model performance, descriptive statistics of the residual errors are given in Table 8.

Fig. 8
figure 8

Agreement between the regression model outputs and the experimental data

Fig. 9
figure 9

The residual errors obtained from 227 observations for the regression model outputs

Table 8 Descriptive statistics of the residuals errors

3.4 Comparison of models

Results of stochastic modeling studies showed that the ANN model produced smaller deviation and exhibited a better predictive performance on forecasting of CODRE values, compared to regression model outputs. Minimum and maximum residuals between regression model outputs and the experimental data were determined to be −0.2539 and 0.2845, respectively. However, ANN testing outputs showed a smaller deviations from the experimental data with a minimum and maximum residuals of −0.1190 and 0.1218, respectively. Although R 2 was found about 0.685 for the regression model, the correlation coefficient between ANN testing outputs and measured values was calculated to be 0.8245, which can be regarded a good fit. This can be attributed to the advantage of ANNs on complex interactions between the inputs and the output, without requiring a mathematical model and any prior knowledge of a solution (Stanley 1988; McCann 2005). Similarly, Roth (1988) and Stevenson (1991) emphasized that ANN techniques could be a good alternative to statistical and theoretical techniques and also iterative problems because of their speed and capability of learning, robustness, predictive capabilities, non-linear characteristics, non-parametric regression capabilities, generalization properties and easiness of working with high-dimensional data. Figure 10 shows a head-to-head comparison of performance for experimental data, ANN testing outputs and regression model outputs.

Fig. 10
figure 10

A head-to-head comparisons of performances for experimental data, ANN testing outputs and regression model outputs

4 Conclusions

The real cotton textile wastewater diluted with domestic wastewater was satisfactorily treated by means of high-rate anaerobic processes, specifically with the use of UASB reactors. UASB reactors showed a remarkable performance on total COD removals with a maximum treatment efficiencies of 60 and 80% for the corresponding HRTs of 4.5 and 9.0 days, respectively. The ANN model used in this study showed a precise and an effective prediction of CODRE values with a saticfactory correlation coefficient of 0.8245 for nine different process parameters. The LMA was found as the best of 11 BP algorithms. The optimal neuron numbers for the LMA were determined to be 12 with a MSE of 0.0223/0. Regression variable results including T-ratios and P values showed that influent COD concentration, HRT, operating temperature, TSS concentration and pH were found to be more important than organic loading rate, alkalinity and VFA concentration on CODRE values for this model structure.

Even though the hydraulic characteristics of the anaerobic process is very complicated, a number of attempts in developing prediction models can help to develop a better understanding of the process. Choosing the most appropriate model representing the extension of the experimental data can help to recognize possible technical faults and to reduce operating costs of plants in the planning stage. Therefore, ANN outputs can be evaluated for different operating data before transferring the concepts to a full scale plant. Based on the ANN predictions on CODRE values, effluent water quality will be evaluated in a speedy and practical manner with respect to discharge standards. In addition, imposed organic and hydraulic loading rates will be selected accurately depending on the desired treatment efficiency. In some cases, it should be noted that the effluent quality in terms of COD obtained may not be meet with the effluent discharge standards depending on the variations in feeding wastewater characteristics. Therefore, a post-treatment is needed to guarantee the desired effluent quality. From this point of view, ANN predictions on CODRE values will also helps the control engineer to determine the required loading capacity on the subsequent treatment units. This will help to reduce operating costs of plants in the planning stage.

On the basis of the advantages of ANNs pointed in this study, the future study will be focused on modeling of UASB system treating different types of real textile wastewaters such as polyester and wool textile manufacturing. Furthermore, different input variables will be tested to observe the effects of each model component on CODRE values.