1 Introduction

A large part of the sediment in lowland rivers is transported as wash load. About 85% of this load consists of silt and clay. Thus, it is concluded that the wash load plays an important role in sediment transport in a river (Mohamad Rezapour et al. 2010, 2012). The carried sediment load by a river is one of the most important factors in the creation and formation of the related delta in the river mouth. Therefore, accurate forecasting of the river sediment load can play a significant role in studying the river delta. However, by considering the complexity and nonlinearity of the phenomenon, the classic experimental or physically based approaches generally cannot handle the problem sufficiently (Nourani 2009). Predicting the sediment load of a river has long been a goal of engineers, hydrologists, sedimentologists, and many other earth scientists (Leopold et al. 1992). Information on suspended sediment load is crucial to river management and environmental protection (Melesse et al. 2011). The difficulties associated with bed-load measurement field have made most researchers to develop an equation for the prediction. Prediction of sediment load requires solving such problems as: determining the dead volume of a dam, sediment transport in a river, design of stable channels, estimation of aggradation and degradation around bridge piers, prediction of sand and gravel mining effects on riverbed equilibrium, determination of the environmental impact assessment, and dredging needs. Besides, sediment is a major pollutant and carrier of nutrients, pesticides, and other chemical materials (Dogan et al. 2007). Although the watershed characteristics do not follow a linear model, it is dynamic with rapid changes that occur constantly. The conventional approaches like clustering techniques and other statistical methods have some limitations when they are used with a large amount of data for classification. Soft computing methods such as genetic algorithm (GA), artificial neural networks (ANN), and adaptive neuro-fuzzy inference system (ANFIS) have an edge over the conventional while dealing with nonlinear complex data in classification. These methods have been applied in most of the fields of science and technology, such as estimation of oxidation parameters (Asnaashari et al. 2015), prediction of discharge coefficient and soil permeability coefficient (Emiroglu and Kisi 2013; Ganjidoost et al. 2015), lot-sizing problem (Senyigit and Atici 2013), rainfall–runoff forecasting (Akrami et al. 2014; Tayfur and Singh 2006), soil temperature modeling (Kisi et al. 2016), and groundwater quality (Khashei-Siuki and Sarbazi 2013).

ANNs are parallel computational models that resemble biological neural network and have better generalization capabilities. They are widely applied in forecasting hydrology and water resource variables. They can recognize patterns, find association among various affecting factors, and use them in forecasting (Kaastra and Boyd 1996). In ANN, back-propagation (BP) network models are common to engineers. The so-called BP network model has the feed-forward artificial neural network structure and a back-propagation algorithm. The multilayer perceptron (MLP) neural networks with the error back-propagation (EBP) training algorithm are one of the most popular ANN architectures, too. An MLP distinguishes itself by the presence of one or more hidden layers, with computation nodes called hidden neurons. By adding hidden layer(s), the network is able to extract higher-order statistics. In a rather loose sense, the network acquires a global perspective despite its local connectivity due to the extra set of synaptic connections and the extra dimension of neural networks (NN) interconnections (Haykin 1994).

Many researchers have utilized soft computing methods for sediment estimation (Firat and Gongor 2010; Ab. Ghani and Azamathulla 2011, 2012; Cigizoglu 2002; Wang et al. 2008, 2009; Chang et al. 2012). Rajabi and Feyzolahpour (2012) for estimating suspended sediment load, used fuzzy neural models (NF), generalized regression neural network (GRNN), multilayer perceptron (MLP), radial basis function (RBF), and sediment rating curves (SRC) in their study. After that, the models were compared with neural differential evolution (NDE). For this purpose, discharge and sediment data considered above models as inputs and then suspended sediment load values were estimated. Results showed that the ANFIS model in comparison with MLP, RBF, GRNN, and SRC models achieved better results; but from the above models, neural differential evolution (NDE) has maximum ability to estimate suspended sediment load. Iamnarongrit et al. (2007) used BP as learning process and sigmoid function as transfer function in their study. Calculation was done to find variance of network, which considered proper fitness value of heredity by using RMSE. They found that the neuro-genetic optimizer model provided forecast results for the Lam Phra Phloeng reservoir closer to the actual sediment volume than the regression model. From the results of Wang et al.’s (2009) study, the BP configuration showed the highest statistical performance in the sediment estimation when the turbidity and water discharge data were used as associated input variables in the network.

ANFIS was first introduced by Jang in (1993) and is a universal imaginative; as such, it is capable of approximating any real continuous function on a compact set to any degree of measurability. ANFIS, on the other hand, combines the advantages of both ANN and fuzzy inference system (Okkan 2012). It is a multilayer feed-forward network, which uses neural network learning algorithms and fuzzy reasoning to map an input space to an output space. ANN and ANFIS have some problems when dealing with non-stationary data (Seo et al. 2015). From a study based on 346 data sets collected from the Kerayong, Kinta, Langat, and Kulim River catchments, Azamathulla et al. (2008) indicated that employing local sediment transport data yielded a network. It can predict measured bed-load transport in moderately sized rivers, more accurately. Ebtehaj and Bonakdari (2017) developed a new hybrid ANFIS method based on a differential evolutionary algorithm (ANFIS-DE). They employed Gaussian membership function (MF) in the study and compared the test results with the results of ANFIS model and regression-based equation. They showed that the ANFIS-DE technique predicted sediment transport at limit of deposition with higher accuracy than regression-based equations.

Genetic algorithm, as a tool of artificial intelligence methods that has inspired and adapted from nature, can predict and optimize complex processes. This global optimization procedure is based on the Darwinian principle of survival of the fittest. Applied to a biological community, it is the principle by which chances of survival of an entire community within a particular environment are increased by discarding inferior members and replacing them with superior offspring (Mohamad Rezapour et al. 2012). GAs are now being used more frequently to solve optimization problems. They also provide a solution for more complex nonlinear problems when compared with traditional gradient-based approaches (Espinoza et al. 2005). GAs have been applied to numerous engineering problems such as management of water systems (Cai et al. 2001), coastal engineering (Cha et al. 2008), river pipeline scour (Azamathulla and Ab. Ghani 2010), groundwater resources design (Hilton and Culver 2005), and total bed material load estimation (Zakaria et al. 2010)

In this study, we investigated ANN, ANFIS, and GA approaches for forecasting the amount of sediment load in Maku dam reservoir. Two normalization methods were applied for input data in both ANN and ANFIS approaches, but GA does not require any normalization. Results have been compared to determine which method performs better. In other articles, comparison between the membership functions of each approach is not seen, but here we used various membership functions to compare their results and the best ones have been introduced.

2 Materials and Methods

2.1 Definition of CM

In general, the three-section method is used when it is possible to sample directly from sediments over Teleferic bridge, during times of flood. On the other hand, multi-cross-sectional method is often done only at first-class stations. In some cases, three-section method can be used at second- and third-class stations. Operational steps sampling of this method is as follows:

  1. (a)

    A particular section of river, where the discharge is measured, is selected.

  2. (b)

    River section is divided into three parts with rather equal discharges.

The amount of sample concentration is measured in the library based on the collected samples from the river. The average concentration of section (Cm or CM) is calculated by Eq. 1. The sample concentration in the fixed point (Cf or CF) is also measured in the library. The ratio of CM/CF, in other words K, is computed and a chart change in K is drawn according to each discharge section. Using chart changes in K according to Q, it is sufficient for sampling to be done only in the deepest point of each cross section; therefore, the amount of CM is estimated. K has a value between 0.4 and 1.6.

$${\text{CM}} = \frac{{C_{1} Q_{1} + C_{2} Q_{2} + C_{3} Q_{3} + C_{4} Q_{4} + \cdots }}{{Q_{1} + Q_{2} + Q_{3} + Q_{4} + \cdots }}$$
(1)

where C1, C2, C3, C4 are sediment concentration in mg/l; Q1, Q2, Q3, Q4 are water discharges in each part of river section in m3/s, the sum of which is equal to measured discharge of entire section; CM is section mean concentration in mg/l.

2.2 General Assumption

Temperature [°C], water discharge [m3/s], and CM [mg/l] were three effective parameters as inputs, and they were considered as output in yielding sediment load (ton/day). There were 181 types of data for each parameter, so there were 724 types of data in total (4 × 181) used in the approaches. Seventy percent of data are used for training and 30 percent for testing.

In ANN and ANFIS approaches, input data are normalized with two methods but target data are natural data. In GA approach, both input and output data are natural. One of the normalizing methods is between [− 1, + 1] and the other one is between [− 2, + 2] as illustrated in Eqs. 2 and 3, respectively.

$$X_{\text{normal}} = \left[ {\frac{{X_{\text{o}} - X_{\hbox{min} } }}{{X_{\hbox{max} } - X_{\hbox{min} } }}} \right] \times 2 - 1$$
(2)
$$X_{\text{normal}} = \left[ {\frac{{X_{o} - X_{\hbox{min} } }}{{X_{\hbox{max} } - X_{\hbox{min} } }}} \right] \times 4 - 2$$
(3)

Three input patterns are introduced to software as follows:

(1) Temperature, discharge, and CM. (2) CM, temperature, and discharge. (3) Discharge, CM, and temperature. Then, they were investigated and the first pattern is used as the only input pattern, due to the better performance. The seventh version of MATLAB software is used for the implementation of the methods. Microsoft Excel 2010 is used for calculating the various statistical errors.

2.3 Performance Evaluation

To evaluate the external predictive performance of the models, 54 more experiments were carried out as a test set. Equations 48 are used to evaluate the errors in the study. Some of them are used in specific approach, but internal present error (PE) is used for evaluating the performance of all the different approaches. The other equations are sum-squared error (SSE), mean absolute error (MAE), mean square error (MSE), and coefficient of correlation (R) as follows:

$${\text{PE}} = \sum {\left| {\frac{{Y_{i}^{\text{observed}} - Y_{i}^{\text{model}} }}{{Y_{i}^{\text{observed}} }}} \right|} \times 100$$
(4)
$${\text{SSE}} = \mathop \sum \limits_{i = 1}^{127} \left( {Y_{i}^{\text{observed}} - Y_{i}^{\text{model}} } \right)^{2}$$
(5)
$${\text{MAE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left| {Y_{i} - \bar{Y}_{i} } \right|$$
(6)
$${\text{MSE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Y_{i} - \bar{Y}_{i} } \right)^{2}$$
(7)
$$R = \frac{{n\left( {\sum \bar{Y}_{i} Y_{i} } \right) - \left( {\sum \bar{Y}_{i} } \right)\left( {\sum Y_{i} } \right)}}{{\sqrt {\left[ {n\left( {\sum \bar{Y}_{i}^{2} } \right) - \left( {\sum \bar{Y}_{i} } \right)^{2} } \right]\left[ {n\left( {\sum Y_{i}^{2} } \right) - \left( {\sum Y_{i} } \right)^{2} } \right]} }}$$
(8)

where \(Y_{i}\) (or \(Y_{i}^{\text{observed}}\)) and \(\bar{Y}_{i}\) (or \(Y_{i}^{\text{model}}\)) denote the observed and estimated total sediment load at the ith step, respectively, and n is number of time steps. MSE indicates the discrepancy between the observed and computed values. The lower the MSE, the more accurate the prediction. In addition, MAE is a linear scouring rule and describes only the average magnitude of the errors, ignoring their direction.

2.4 The Study Area

Maku dam watershed in Iran is chosen in this study. It is between 38°57′N and 39°16′N latitude and 44°06′E to 44°39′E longitude. The watershed is located in the north–west region of Iran. Its climate is semiarid type. The used data in this study are achieved from Mazra_e station. Figure 1 shows the Mazra_e station, which is located at about 39°10′N latitude and 44°25′E longitude with an elevation of 1712 m. Digital elevation model (DEM) is created by GIS, and classification of elevation (nine classes) can be seen by the different colors in it. The site of the Maku dam includes the Gizlarchay River and the Zangmar River which join together at the back of the dam and are also displayed in Fig. 1. The minimum of monthly average discharge related to September is 1.14 m3/s, and the maximum of it related to May is 10.21 m3/s. Maku dam is 78 m in height, has a length of 350 m, and is an earth dam. The area of dam lake is about 800 hectares.

Fig. 1
figure 1

The study area with its details

2.5 ANN Approach

A three-layer BP network model has been proven satisfactory for forecasting and stimulation as a general approximation (Hornik et al. 1989). Thus, a BP network model with 2–5 layers is chosen for this study. The MLP is trained using the Levenberg–Marquardt technique, as this technique is more powerful than the conventional gradient descent techniques (EL-bakyr 2003; Hagan and Menhaj 2010; Kisi 2004). In addition, networks with three transfer functions, namely tan sig, log sig, and pure line, with momentum back-propagation are designed in this study. Throughout all MLP simulations, the adaptive learning rates were used to speed up training. 1000 number of epochs and 10 maximum fail are constant for all ANN networks. Number of layers, number of neurons, and type of transfer function are variable, except output layer that is purelin because of using natural data as mentioned before. The primary transfer function created appropriate initialization weights, and they used as the neural network initial weights. First, a network that is just large enough to provide an adequate fit is selected in order to prevent overtraining problem, via trial and error; secondly, early stopping technique is applied. The data are divided into three subsets. The first subset is the training set. To construct the second and third subsets, the test and the validation sets are considered as the same sets. Then, the error on the validation set is monitored during the training procedure. Because the structure of the network is selected optimal, the validation error decreased during the initial phase of training, as did the error of training set. Now that the model is obtained, all the weights are fixed to examine the generalization ability of the trained neural network, by the remaining 54 unused data sets.

Similar to the previous procedure, to evaluate the external predictive performance of the neural network model, 54 more experiments were carried out as a test set.

2.6 ANFIS Approach

In this study, in order to obtain the values of the output variable from those of the input ones, the neuro-fuzzy model implements the Sugeno fuzzy approach introduced by Takagi and Sugeno (1985). Eight membership functions (MFs) are used and the number of MFs determined iteratively. Ebtehaj and Bonakdari (2014) showed that the hybrid algorithm presents better results than back-propagation; so, hybrid algorithm was also used. Nonetheless, the grid partitioning identification methods of the Sugeno FIS models are applied for mapping the nonlinear relationship among the input–output variables. The grid partitioning method proposes independent partitions of each antecedent variable through defining the membership functions of all antecedent variables. Two types of input normalization are implemented like the ANN approach. Therefore, the input variables of the ANFIS models are normalized variables and the output layer corresponds to the natural one.

2.7 GA Approach

In this study, the experimental design is a modified BoxBehnken design for three variables. An MLP is used to model the relationship through the input data and the output. In fact, MLP acts as a blackbox that represents a model for the sediment load for which the temperature, water discharge, and CM are the model inputs. The complete design consists of 127 experimental points. These experiments are carried out in random order. Obtained data are analyzed to fit the following polynomial equation to sediment load. This nonlinear model corresponds to Eq. 9.

$$Y \, = \, \beta_{0} + \beta_{1} X_{1}^{\alpha 1} + \beta_{2} X_{2}^{\alpha 2} + \beta_{3} X_{3}^{\alpha 3} + \beta_{12} X_{1} X_{2} + \beta_{13} X_{1} X_{3} + \beta_{23} X_{2} X_{3}$$
(9)

where β values are constant regression coefficients; temperature is considered with X1, water discharge with X2, and CM with X3. These three independent variables are considered in the preparation of sediment yield which is denoted with Y. GA method was applied to calculate constant regression coefficients (β values), optimally. First, a model was modified by Eq. 9 that includes seven unknown parameters. The parameters should be determined by GA to minimize cost function as sum square error (SSE), defined in Eq. 5. Here, \(Y_{i}^{\text{observed}}\) is ith sediment observed in experiment and \(Y_{i}^{\text{model}}\) is ith sediment obtained by the model Eq. 9. The following values were set for the parameters of the GA method, in all simulations. Twenty chromosomes constructed the population. This population size is suitable because high population size causes the algorithm to be complicated and slow, where low population size may make a weak algorithm which falls in local optimum. In the early stages of the algorithm, when good solutions are not found, the crossover rate (Xrate) is set to 0.6. After some generations, the GA found that near-optimal solution’s X rate reduced gradually as more good parents survive and mate. This enables more usage of good parents to create excellent chromosomes. The initial value for mutation rate is set to 0.65. Then, gradually decreasing values, starting from 0.65, are defined for the mutation operator. Finally, rank selection method was selected for selection operator. According to the experience, literature (Valizadeh et al. 2009), and trial and error, the maximum number of generations and populations size for GA method is selected to be 1200 and 100, respectively. To evaluate the external predictive performance of the models, 54 more experiments were carried out as a test set.

3 Results and Discussion

ANN With considering different form of normalization in ANN approach, different results have achieved that are displayed in Tables 1 and 2. Table 1 shows the results of input normalization between − 1 and + 1 interval, and Table 2 shows it for − 2 to + 2 interval. Both “logsig” MF and “tansig” MF are used as input and hidden layers. All the output layers are “purelin” type, because of using normal data; they are not mentioned in the tables for summary. The results prove the following.

Table 1 ANN results for normalization between − 1 and + 1
Table 2 ANN results for normalization between − 2 and + 2

Network regression performance for training, validation, test, and total steps for the − 2 to + 2 and − 1 to + 1 intervals is presented in Figs. 2 and 3, respectively. It is obvious that Fig. 2 shows better results at all steps.

Fig. 2
figure 2

One of the runs for normalization between − 2 and + 2 interval

Fig. 3
figure 3

One of the runs for normalization between − 1 and + 1 interval

  1. (A)

    Comparison between all results in both tables showed that “logsig” MF with five neurons has the best performance (row of 5 in Table 2). It has values of 0.303 and 0.106 for training and testing errors, respectively. However, the results are not stable and have more fluctuations in terms of MFs type and number of neurons.

  2. (B)

    Network with seven-digit number of neurons produced a better answer in terms of low error rate and error distribution than the other ones.

  3. (C)

    Comparing peer-to-peer results of both normalization methods indicated that error values in normal range of − 2 to + 2 are less (3.611 for training and 2.217 for testing error) than similar values in the other range (7.644 for training and 4.574 for testing error). This better performance also is shown in Fig. 2 in comparison with Fig. 3.

  4. (D)

    The “logsig” MF gives better results in both normalization intervals, as lowest training and testing error and with 3.853 average error.

  5. (E)

    The average of 4.515% PE is achieved for overall runs in this approach. This error consisted of 5.63% PE in train phase and 3.40% in testing phase.

ANFIS Several types of membership functions (MFs) can be used for implementation of fuzzy logic. However, some studies have shown that the type of MF does not affect the results, fundamentally (Vernieuwe et al. 2005); but this study emphasized on the importance of MF type on the obtained results, like Kisi and Shiri’s (2012) paper. Results of these MFs are given in Tables 3 and 4. Increasing number of epochs causes errors to decrease, and it can be seen for a run in training step in Fig. 4. The results prove the following:

Table 3 ANFIS results for normalization between − 1 and + 1
Table 4 ANFIS results for normalization between − 2 and + 2
Fig. 4
figure 4

Error via epochs performance

  1. (A)

    In general, for each type of MF, error values are decreased by increasing the number of courses to a specific number. This performance is shown in Fig. 4, too. In the meantime, the “trimf” MF is the only function that produced fixed course answers. Therefore, this function is considered a useless function for this study.

  2. (B)

    It was observed that “dsigmf” MF and “psigmf” MF have produced very close output results in both tables in this section.

  3. (C)

    The best and the worst results are observed for “gaussmf” MF and “pimf” MF, respectively, and results had less fluctuation.

  4. (D)

    The input normalization between − 1 and + 1 interval has produced 0.679 average error, while the second interval had 1.245 average error. It indicates that the first normalization has better performance in this approach in contrast to the ANN one.

  5. (E)

    The average of 0.968% PE is achieved for overall runs in this approach.

GA Table 5 displays the best performance of GA approach consisting of training and testing error with numbers of generation and population. Figure 5 shows the best (elite) minimized (SSE) by GA in 1200 generations.

Table 5 The best performance of GA
Fig. 5
figure 5

The best minimized error (SSE) computed by GA

As this figure shows, no superior improvement was achieved in the decreasing cost, after about 350 generations. Therefore, one of the mentioned stopping criteria is satisfied that the algorithm can be terminated after 350 generations and the optimal solution (elite) has returned. The accuracy of the proposed model is validated by conducting other reactions with different conditions. Then, the obtained results are compared with the model. The internal percentage error (PE) of the proposed model can be calculated using Eqs. 4 and 5 for the 154 experiments. The average PE for all 154 experiments obtained by GA is 10%. The modified model, obtained by using GA for sediment yield, is the following equation:

$$Y = \, - 55.47 + 2.41X_{1}^{2.3} + 2.01X_{2}^{1.3} + 24.40X_{3}^{0.74} + 0.013X_{1} X_{2} - 0.45X_{1} X_{3} + 0.30X_{2} X_{3}$$
(10)

The \(Y_{i}^{\text{model}}\) is the sediment yield that is calculated using Eq. 10, and \(Y_{i}^{\text{observed}}\) is the sediment yield that is obtained in defined conditions for each experiment. Genetic algorithm is used for finding the optimum values of the model while minimizing an error cost function. The results prove the following:

  1. (A)

    In this approach, several learning error and testing error values are seen, with decreasing and increasing population and the number of generations. It is shown that the number of 100 individual populations and the number of 1200 generations produced better results, by trial and error. It can be said that increasing the number of population and the number of generations will not decrease the error, mostly.

  2. (B)

    The advantage of this method is the use of non-normal (natural) amounts, which can reflect the adaptability of the algorithm in the distribution of nonlinear phenomena such as sediment.

  3. (C)

    This approach had the values of 7% PE for training error and 13% PE for the test error. So, the average of 10% PE is achieved in this approach.

4 Summary and Conclusions

Sedimentation in dam’s reservoir is one of the most important problems that is threatening dams’ capacity. On the other hand, sediment transport phenomenon is one of the complicated discussions in river engineering. Therefore, it is important for engineers to forecast the amount of sediment discharge at the inlet of dams. In order to manage this problem, many methods are created and used by researches. We used three artificial intelligent (AI) approaches for forecasting the amount of sediment load inletting the Maku dam reservoir. These approaches were artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), and genetic algorithm (GA). Temperature [°C], water discharge [m3/s], and CM (three-section method of sediment sampling) [mg/l] were three input parameters to the models. Two normalizing methods are done for these input parameters at ANN and ANFIS approaches: first the interval between − 1 and + 1, and the second one between − 2 and + 2. GA approach does not require any normalization at input stage. Sediment discharge [ton/day] was the only output parameter. There was not any normalization in three approaches, and natural data were used for all. Some methods such as sum-squared error (SSE), mean absolute error (MAE), mean square error (MSE), and coefficient of correlation (R) are used to evaluate the performance of membership functions (MFs) in each approach. Internal present error (PE) is used to evaluate the error of performances between three approaches. Results for ANN revealed that: “logsig” MF with five neurons has the best performance between the other runs. Normalization method between the range of − 2 and + 2 had better performance that the range of − 1 to + 1. Results for ANFIS indicated that: “gaussmf” MF had the best performance. Normalization method between − 1 and + 1 was better than the other one. This was in contrast to the ANN approach. Results for GA showed that the number of 100 and 1200, respectively, for individual populations and generations produced better performance. Finally, we can say that ANFIS with the average 0.968% PE had the least error and ANN with the average 5.63% PE was in the second place. Although GA with the average 10% PE had the third place, with considering that it did not require any normalization at input stage, it can be said that it had superior advantage in comparison to the other two approaches.