Introduction

The surface water quality is one of the major issues today because of its effects on human health and aquatic ecosystems. With the increase in population, there is increasing pressure on water resources. Surface water quality in a region is largely determined both by natural processes including the lithology of the basin, atmospheric inputs, and climatic conditions, and by anthropogenic inputs such as municipal and industrial wastewater discharge. On the other hand, rivers play a major role in assimilating or transporting municipal and industrial wastewater and runoff from agricultural land. Municipal and industrial wastewater constitutes a constant pollution source, whereas surface runoff is a seasonal phenomenon, largely affected by climate within the basin (Singh et al. 2004).

Dissolved oxygen (DO) is one of the important water quality parameters of an aquatic ecosystem and a significant status indicator for the aquatic ecosystems. The sources of DO in a water body include re-aeration from the atmosphere, photosynthetic oxygen production, and DO loading. The sinks include oxidation of carbonaceous and nitrogenous material, sediment oxygen demand, and respiration by aquatic plants (Kuo et al. 2007). Identification and quantification of DO profiles of river is one of the primary concerns for water resources managers.

Several DO models such as deterministic and stochastic models have been developed in order to manage the best practices for conserving the DO in water bodies (Ansa-Ansare et al. 2000; García et al. 2002; Wang et al. 2003; Hull et al. 2008; Shukla et al. 2008). Most of these models are complex and need several different input data which are not easily accessible, making it a very expensive and time-consuming process (Suen et al. 2003). Artificial neural networks (ANNs) are flexible modeling tools with the capability of learning the mathematical mapping between input and output variables of nonlinear systems and generalizing the processes of control, classification, and prediction. They are capable of providing a neuron computing approach to solve complex problems. In the last decade, ANNs have been widely successfully applied to various water resources problems, such as hydrological processes (Nayak et al. 2004; Sahoo et al. 2005; Dastorani et al. 2010; Guo et al. 2011; Wu and Chau 2011; Senkal et al. 2012), water resources management (Kralisch et al. 2003; Sreekanth and Datta 2010), groundwater problems (Daliakopoulos et al. 2005; Dixon 2005; Garcia and Shigidi 2006; Nayak et al. 2006; Ghose et al. 2010; Banerjee et al. 2011), and water quality (Ha and Stenstrom 2003; Kuo et al. 2006; Anctil et al. 2009; da Costa et al. 2009; Dogan et al. 2009; Chang et al. 2010; He et al. 2011). ANNs also have been used for modeling and forecasting DO (Kuo et al. 2007; Singh et al. 2009; Ranković et al. 2010; Najah et al. 2011). Furthermore, some intelligence algorithm such as genetic algorithm was used with ANN model for the management of the watershed water quality problem (Kuo et al. 2006). ANNs, as effective tool for the computation of water quality, can be regarded as a powerful predictive alternative to traditional modeling techniques.

In the arid northwest of China, water resources play a dominant role in the development of the economy. Careful management is important for ecological and environmental protection (Wen et al. 2007). Due to extensive use of surface water, the quality of the surface water has also impacted for the last few decades (Wang et al. 1999). Predicting the water quality evolution of surface water in these arid regions can enhance understanding for river water systems and help decision makers effectively manage water resources. The main purpose of this study is to analyze and discuss the performances of ANNs in modeling of DO in the Heihe River.

Material and methods

Study area

The Heihe River in northwestern China is one of the largest inland rivers in China, covering an area of 1.3 × 105 km2. It originates from the Qilian Mountains, flowing through the Zhangye basin and the lower reaches (also known as the Ejina Basin) (Fig. 1). The middle reaches of Heihe River, from the mountain outlet (Yingluoxia) to the end of the middle reaches of the Heihe River (Zhengyixia), is 185 km in length with an average slope of 2 %, covering an area of 1.08 × 104 km2, including Zhangye City, Linze County, and Gaotai County (Fig. 1). This area has an arid continental climate with a mean annual temperature of 3–7 °C. The average annual precipitation ranges from 50 to 150 mm, with the majority (∼80 %) falling from June to September. The average annual potential evaporation is 2,000–2,200 mm (Gao 1991).

Fig. 1
figure 1

Location of the study area and the water quality monitoring station

Water sampling procedure

The data collected from three water quality monitoring stations, including Yingluoxia, Gaoai, and Zhengyixia (Fig. 1). The Yingluoxia is located in the entrance of the middle reaches of Heihe River, and the stations are situated at upstream sites of the study area. This station receives pollution from nonpoint sources, i.e., mostly from agricultural activities with relatively low river pollution; the Gaoai and Zhengyixia are located in the central part and the end of the middle reaches of Heihe River, respectively, representing the high pollution of the river. These stations are situated at downstream sites of the study area. These stations receive pollution from point and nonpoint sources, i.e., agricultural and livestock farms, domestic wastewater, and surface runoff from villages. The river quality was monitored monthly at three different sites over 6 years (2003–2008) comprising nine water quality parameters. Although more than 20 water quality parameters were available, only nine parameters were selected due to their continuity in measurement at all selected water quality monitoring stations. The selected water quality parameters included pH, electrical conductivity (EC, microsiemens per centimeter), chloride (Cl, milligrams per liter), calcium (Ca2+, milligrams per liter), total alkalinity (TA, milligrams per liter), total hardness (TH, milligrams per liter), nitrate nitrogen (NO3-N, milligrams per liter), ammoniacal nitrogen (NH4-N, milligrams per liter), and dissolved oxygen (DO, milligrams per liter). DO, EC, and pH were measured in the field by the portable multi-parameter water quality analyzer, and the analytical precision of DO was within ±2 %. Other water quality parameters were determined using Standard Methods (APHA 1995).

The independent water quality parameters showed a coefficient of variation between 2.69 and 151.17 % (Table 1). Such variability among the samples may be attributed to the large geographical variations in climate and seasonal influences in the study area. Parameter pH showed lowest variation. Compared to the natural origin parameters, water quality parameters of anthropogenic origin showed high variations due to the buffering capacity of the river. The correlation coefficient between DO and the input parameters was calculated and presented in Table 1.

Table 1 Basic statistics of the measured water quality parameters in Heihe River

The available data are generally divided into training, validation, and testing subsets to develop an ANN model. The training set is used to estimate the unknown connection weights; the validation set is used to decide when to stop training in order to avoid overfitting and/or which network structure is optimal; and the test set is used to assess the generalization ability of the trained model (Maier et al. 2010). In this study, the complete river water quality data set (164 samples × 8 variables) was randomly divided into three sections including training, validation, and test sets comprised of 100 (60 %), 32 (20 %), and 32 (20 %) samples, respectively. The output variables (DO) corresponding to the input variables belonged to the same water sample, thus measured in the same time and space.

In view of the requirements of the neural computation algorithm, the raw data of both the input and output variables were normalized to an interval by transformation. All the variables were normalized ranging from −1 to 1 as follow equation:

$$ {x_{\mathrm{n}}} = 2 \times \frac{{{x_{\mathrm{i}}} - {x_{{\min }}}}}{{{x_{{\max }}} - {x_{{\min }}}}} - 1 $$
(1)

where x n and x i represent the normalized and original training, test, and validation data; x min and x max denote the minimum and maximum of the training, test, and validation data.

Artificial neural network modeling

An artificial neural network (ANN) is a mathematical structure designed to mimic the information processing functions of a network of neurons in the brain (Hinton 1992; Jensen 1994). ANNs are highly parallel systems that process information through many interconnected units that respond to inputs through modifiable weights, thresholds, and mathematical transfer functions. Each unit processes the pattern of activity it receives from other units and then broadcasts its response to still other units. ANNs are particularly well suited for problems in which large data sets contain complicated nonlinear relations among many different inputs. ANNs are able to find and identify complex patterns in data sets that may not be well described by a set of known processes or simple mathematical formulae.

Multilayer perceptron neural network

Among the various types of ANNs that have been developed over the years, the multilayer perceptron (MLP) neural network structure is the most commonly used and well-researched class of ANNs (Ouarda and Shu 2009). A feed forward MLP network consists of an input layer which receives the values of the input variables, an output layer which provides the model output, and one or more hidden layers. Nodes in each layer are interconnected through weighted acyclic arcs from each preceding layer to the following, without lateral or feedback connections (Shu and Ouarda 2007). Principe et al. (2000) emphasize that the main advantage is in being easy to use, and the key disadvantages are that they train slowly and require a large amount of training data, and easily to get stuck in a local minimum. However, MLP with a sufficient number of hidden units can approximate any continuous function to a prespecified accuracy; in other words, MLP networks are universal approximations (Cherkassky and Mulier 1998).

Back-propagation neural network and leaning algorithm

It has been well recognized that a neural network with one hidden layer is capable of approximating any finite nonlinear function with high accuracy and three more hidden layered systems are known to cause unnecessary computational overload (Kim and Gilley 2008). Hence, an MLP network with one hidden layer trained by back-propagation (BP) neural network was used to build the ANN model for modeling of the river water DO with eight input variables as shown in Fig. 2. The activation function consists of a tan-sigmoid function in the hidden layer and a linear function in the output layer. The mathematical expression of the MLP is as follows:

Fig. 2
figure 2

General conceptual neural network for the DO in the Heihe River

$$ {\overline x_j} = \sum\limits_i {{w_{{ij}}}{x_i} + {w_j}} $$
(2)
$$ {x_j} = f\left( {{{\overline x}_j}} \right) = \frac{1}{{1 + {e^{{ - {{\overline x}_j}}}}}} $$
(3)

where x i is the output of node i located in any one of the previous layers, w ij the weight associated with the link connecting nodes i and j, and w j the bias of node j.

Since the weights w ij are actually internal parameters associated with each node i, changing the weights of a node will alter the behavior of the node and in turn alter the behavior of the whole back-propagation MLP. First, a squared error measure for the pth input–output pair is defined as:

$$ {E_p} = {\sum\limits_k {\left( {{d_k} - {x_k}} \right)}^2} $$
(4)

where d k is the desired output for node k, and x k is the actual output for node k when the input part of the pth data pair is presented. To find the gradient vector, an error term e j for node i is defined as:

$$ {e_j} = \frac{{\partial {E_p}}}{{\partial {{\overline x}_j}}} = \frac{{\partial {{\sum\limits_k {\left( {{d_k} - {x_k}} \right)}}^2}}}{{\partial {{\overline x}_j}}} $$
(5)

By the chain rule, the recursive formula for e j can be written as:

$$ {e_j}\left\{ {\matrix{ { - 2\left( {{d_j} - {x_x}} \right)\frac{{\partial {x_j}}}{{\partial {{\overline x}_j}}} = - 2\left( {{d_j} - {x_x}} \right){x_j}\left( {1 - {x_j}} \right)} \hfill &{{\mathrm{If}}\,{\mathrm{node}}\,j\,{\mathrm{is}}\,{\mathrm{a}}\,{\mathrm{output}}\,{\mathrm{node}}} \hfill \\ {\frac{{\partial {x_j}}}{{\partial {{\overline x}_j}}}\sum\limits_{{{k_j} < k}} {\frac{{\partial {E_p}}}{{\partial {{\overline x}_j}}}\frac{{\partial {{\overline x}_k}}}{{\partial {x_j}}} = {x_j}\left( {1 - {x_j}} \right)\sum\limits_{{{k_j} < k}} {{e_k}{w_{{jk}}}} } } \hfill &{\mathrm{Otherwise}} \hfill \\ }<!end array> } \right. $$
(6)

where w jk is the connection weight from node j to k and w jk is zero if there is no direct connection. Then, the weight update Δw jk for off-line learning is:

$$ \varDelta {w_{{jk}}} = - \eta \frac{{\partial E}}{{\partial {w_{{jk}}}}} = - \eta \sum\limits_p {\frac{{\partial {E_p}}}{{\partial {w_{{jk}}}}}} $$
(7)

where ŋ is a learning rate that affects the convergence speed and stability of the weights during learning. In vector form,

$$ \varDelta w = - \eta {\nabla_w}E $$
(8)

where E = Σ p E p . This corresponds to a way of using the true gradient direction based on the entire data set. The way we adapt to speed-up training is to use the momentum term:

$$ \varDelta w = - \eta {\nabla_w}E + \alpha \varDelta {w_{\mathrm{prev}}} $$
(9)

where Δw prev is the previous update amount and α is the momentum constant. As for the detail of the backpropagation MLP, interested readers can refer to any literatures addressing neural network theory for more information (Freeman and Skapura 1991; Jang et al. 1997; Kuo et al. 2006).

There are several optimization methods to improve the convergence speed and the performance of network training. In this paper, the Bayesian regularization BP algorithm was selected. The Bayesian regularization is an algorithm that automatically sets optimum values for the parameters of the objective function. In the approach used, the weights and biases of the network are assumed to be random variables with specified distributions. To estimate regularization parameters which are related to the unknown variances, statistical techniques are used. The advantage of this algorithm is that whatever the size of the network, the function will not be over-fitted. Bayesian regularization has been effectively used (Porter et al. 2000; Coulibaly et al. 2001a, b; Anctil et al. 2004; Krishna et al. 2008). A more detailed discussion of the Bayesian regularization can be found in the literature (MacKay 1992).

Determining the number of neurons in the hidden layer is an important task when designing an ANN (Shu and Ouarda 2007). Too many hidden nodes may lead to the problem of overfitting. Too few nodes in the hidden layer may cause the problem of underfitting. The appropriate number of nodes in a hidden layer was recommend ranging from (2n 1/2 + m) to (2n + 1), where n was the number of input nodes and m is the number of output nodes (Fletcher and Goss 1993). In this study, a trial and error procedure for the hidden node selection was carried out by gradually varying the number of nodes in the hidden layer.

During the training processes, there are three factors that are associated with the weight optimization algorithms. These are: (1) initial weight matrix, (2) learning rate, and (3) stopping criteria such as (a) fixing the number of epoch size, (b) setting a target error goal, and (c) fixing minimum performance gradient. The initial weights are randomly generated between −1 and 1 with a random number generator. The value of the learning parameter is not fixed. Maier and Dandy (1998, 2000) reported that optimization of learning parameter was highly problem dependent and should be selected so that oscillation in error surface can be avoided. Hagan et al. (1996) demonstrated that the learning became unstable for higher values (>0.035). Thus, the learning rate was set as 0.01.

The mean square error (MSE) can be used to determine how well the network output fits the desired output. MSE, the smaller values ensuring the better performance, is defined as follows

$$ {\mathrm{MSE}} = \frac{1}{n}\sum\limits_{{i = 1}}^n {{{\left( {{O_i} - {P_i}} \right)}^2}} $$
(10)

where n is the number of input samples, and O i and P i are the measured and network output value from the ith elements, respectively. The maximum numbers of epochs, target error goal MSE, and the minimum performance gradient were set as 105, 10−5, and 10−5, respectively. Training stops when any of these conditions occur. All the computations were performed using MATLAB software (MathWorks, Inc., Natwick, MA).

Statistical forecasting of ANN model

The performance of developed models can be evaluated using several statistical tests that describe the errors associated with the model. The MSE, the coefficient of correlation (r), and the root mean square error (RMSE) were used to provide an indication of goodness of fit between the measured and modeled values.

Coefficient of correlation is defined as the degree of correlation between the measured and modeled values:

$$ r = \frac{{\sum\limits_{{i = 1}}^n {\left( {{P_i} - \overline P } \right)\left( {{O_i} - \overline O } \right)} }}{{\sqrt {{\sum\limits_{{i = 1}}^n {{{\left( {{P_i} - \overline P } \right)}^2}\sum\limits_{{i = 1}}^n {{{\left( {{O_i} - \overline O } \right)}^2}} } }} }} $$
(11)

The RMSE can be calculated as follows:

$$ {\mathrm{RMSE}} = \sqrt {{\frac{1}{n}\sum\limits_{{i = 1}}^n {{{({O_i} - {P_i})}^2}} }} $$
(12)

where n is the number of input samples; and O i and P i are the measured and network output value from the ith elements, respectively. \( \overline O \;and\;\overline P \) and are their average, respectively.

Results and discussion

DO model result

The optimum number of neurons was determined based on the minimum MSE value of the training data set. The training of the BP MLP-NN was performed with a variation of 5–17 neurons. Each architecture configuration was trained 50 times with different initializations, and then, the best network was retrained to calculate the overall accuracy. Determined by the relationship between the numbers of neurons versus MSE during training, the MSE value decreased to 0.1825 when 14 neurons were used. Thus, 14 neurons were selected as the best number of neurons.

The selected ANN for the DO model was composed of one input layer with eight input variables, one hidden layer with 14 neurons, and one output layer with one output variable. The coefficient of correlation (r) and RMSE were computed for the training. Validation and test data sets used for the DO model were presented in Table 2. Figure 3 showed the fittings between measured and modeled values of DO in training, validation, and testing sets. The coefficient of correlation (r) values for the training, validation, and test sets were 0.9654, 0.9841, and 0.9680, respectively. The respective values of RMSE for the training, validation, and test sets were 0.4272, 0.3667, and 0.4570, respectively. A closely followed pattern of variation by the measured and modeled DO concentrations in the Heihe River was shown in Fig. 2, with coefficient of correlation (r) and RMSE values suggesting a good-fit of the DO model to the data set.

Table 2 Performance parameters of the artificial neural network model
Fig. 3
figure 3

Comparison of the measured and modeled DO values in a training, b validation, and c testing sets

Sensitivity analysis

To evaluate the effect of input variables on the DO model, two evaluation processes were used. Firstly, the performance evaluation of various combinations of the parameters was investigated utilizing the coefficient of correlation (r) and RMSE approaches to determine the most effective variables on the output. The optimal network architecture of the various combinations of the parameters was selected based on the one with minimum of MSE using the 14 neurons. Overall, nine networks were compared as shown in Table 3. Each one demonstrated the extents which the eliminated variable would affect the network accuracy. Apparently, the precision of model became higher if Cl was eliminated from the input variables to the model, where minimum RMSE and coefficient of correlation (r) were determined to be 0.4712 and 0.9691 for the testing data set, respectively. Therefore, Cl could be excluded. Conversely, the coefficient of correlation (r) reduced if other input parameters was removed, which reduced the ability of ANN in the capability modeling. Furthermore, DO was found to be sensitive to the pH, NH4-N, and NO3-N variables.

Table 3 Modeling accuracy of the input variables was eliminate from the model

Secondly, the neural net weight matrix was used to assess the relative importance of the input variables (Garson 1998; Elmolla et al. 2010). In this study, the proposed network consisted of eight variables. Assuming the connection weights from the input nodes to the hidden nodes demonstrate the relative predictive importance of the independent variable, the importance of each input variable can be expressed as follows:

$$ Ij = \frac{{\sum\limits_{{m = 1}}^{{m = Nh}} {\left( {\left( {{{{\left| {W_{{jm}}^{{ih}}} \right|}} \left/ {{\sum\limits_{{k = 1}}^{{Ni}} {\left| {W_{{km}}^{{ih}}} \right|} }} \right.}} \right) \times \left| {W_{{mn}}^{{ho}}} \right|} \right)} }}{{\sum\limits_{{k = 1}}^{{k = Ni}} {\left\{ {\sum\limits_{{m = 1}}^{{m = Nh}} {\left( {\left( {{{{\left| {W_{{jm}}^{{ih}}} \right|}} \left/ {{\sum\limits_{{k = 1}}^{{Ni}} {\left| {W_{{km}}^{{ih}}} \right|} }} \right.}} \right) \times \left| {W_{{mn}}^{{ho}}} \right|} \right)} } \right\}} }} $$
(6)

where Ij is the relative importance of the jth input variable on the output variable; Ni and Nh are the number of input and hidden neurons, respectively; W is connection weight; the superscripts i, h, and o refer to input, hidden, and output layers, respectively; and subscripts k, m, and n refer to input, hidden, and output neurons, respectively.

Table 4 showed the connection weight values for the proposed model. The relative importance of each of the input variables as computed by Eq. (6) was shown in Fig. 4, illustrating the significance of a variable compared with the others in the model. Although the network did not necessarily represent physical meaning through the weights, it suggested that all the variables had strong effects on the prediction of DO (Singh et al. 2009), where the predictor contributions ranged from 7.4 to 18.9 % and pH, NO3-N, NH4-N, and Ca2+ had relatively high contributions Fig. 3. In addition, pH and NO3-N were the high influential variables with relative importance of 18.9 and 15.8 %. It was obvious that the most effective inputs were those which included oxygen containing (NO3-N) and oxygen demanding (NH4-N). Moreover, Cl revealed the least contribution on the proposed model. These relationships represented that high levels of dissolved organic matter consume large amounts of oxygen, which underwent anaerobic fermentation processes leading to formation of ammonia and organic acids. Hydrolysis of these acidic materials causes a decrease of water pH values (Vega et al. 1998; Singh et al. 2004).

Table 4 Connection weights between input and hidden layers (W1) and weights between hidden and output layers (W2)
Fig. 4
figure 4

The relative importance of the input variables to DO ANN model for Heihe River

Conclusion

An artificial neural network was developed to simulate the dissolved oxygen (DO) concentration in the Heihe River (Northwestern China). A three-layer (one input layer, one hidden layer, and one output layer) BPNN was used with the Bayesian regularization training algorithm. Water quality variables such as pH, electrical conductivity (EC), chloride (Cl), calcium (Ca2+), total alkalinity (TA), total hardness (TH), nitrate nitrogen (NO3-N), and ammonical nitrogen (NH4-N) were used as the input data to obtain the output of the neural network, DO. Fourteen neurons were selected as the best number of neurons based on the minimum value of MSE of the training data set. A well-trained ANN produced results with the coefficient of correlation (r) of 0.9654, 0.9841, and 0.9680, and the RMSE of 0.4272, 0.3667, and 0.4570 for the training, validation, and test sets, respectively, with good match between the measured and modeled DO. The sensitivity analysis showed that the input variables such as pH, NO3-N, NH4-N, and Ca2+ had strong effect on DO. In addition, pH and NO3-N were the high influential parameters with relative importance of 18.9 and 15.8 %, while Cl revealed the least contribution on the proposed model and can be excluded. The result demonstrated that the proposed ANN model was a better choice for modeling DO levels with limited knowledge of the water quality parameters.