1 Introduction

The estimation of suspended sediment loads (SSL) is required in river restoration, stable channel design and water quality assessment. It is a difficult and sophisticated task in practice, however, since the sediment transport is highly nonlinear and governed by many factors including strength of flow, sediment supply, river bed, etc. Conventional sediment rating curves (SRC) are incapable of providing sufficiently accurate estimates attributed to the misleading practice of using sediment load versus discharge (McBean and Al-Nassri 1988; Demirci and Baltaci 2013). Artificial intelligence techniques have been proven to be efficient tools in modeling sediment loads. Alp and Cigizoglu (2007) employed two artificial neural network (ANN) models, i.e. feedforward back propagation (FFBP) method and radial basis functions (RBF) to estimate daily SSL. The use of support vector machine (SVM) was investigated by Cimen (2008) for SSL estimation in rivers. Lafdani et al. (2013) used a combination of gamma test and genetic algorithm (GT-GA) to identify the best input of SVM and ANN models for daily SSL prediction. These models could capture the nonlinear behavior of sediment data without going into the details of physical processes in watershed. Nevertheless, in reverse, the totally implicit and physically meaningless features are also the major criticisms. It is still necessary to develop estimation models with conceptual ideas to reflect the characteristics of sediments.

The fuzzy nature of SSL series necessitates the utilization of fuzzy and highly nonlinear methods for sediment simulation. Fuzzy logic was accepted as a good procedure in suspended sediment estimation (Tayfur et al. 2003; Kisi et al. 2006; Demirci and Baltaci 2013). It yielded better results than SRC and ANN models since the degree of ‘belongingness’ to a set or category is described by a membership number. Neuro-fuzzy models for suspended sediment estimation were found to provide good performances as well (Cobaner et al. 2009; Kisi et al. 2009; Rajaee et al. 2009). However, these approaches suffer from difficulties in their manipulation as they need different membership functions in different cases. A flexible and transparent model which allows implementing the fuzzy concept in activation functions is appreciated. Qiu et al. (1998) introduced a fuzzy pattern-recognition activation function (from the input layer to the hidden layer) into an ANN model for annual runoff forecasting. This function classified inputs into a number of categories in terms of different patterns. In this way, the fuzziness of runoff was considered with respect to the seasonal characteristic of the river system. Zhao and Chen (2008) further applied this model for predictions in ungauged basins using hydrological data in gauged basins that were similar. In their study, the fuzzy pattern-recognition activation function was employed to connect the hidden nodes and the network output. This method offers practically significant advantage over other fuzzy-based models and is employed in this study for SSL estimation.

In addition, the time-varying nature of sediment transport process can be considered by adding a continuity equation in the ANN structure, as inspired by Yang et al. (1998) and Li and Gu (2003). In their works, the nodes in the hidden/output layers were regarded as storage reservoirs, and continuity equation was satisfied when river flows from upstream to downstream sections. Yang et al. (1998) successfully forecasted monthly river flow of Salford University station in Irwell River basin. Li and Gu (2003) expanded this method to the sediment yield forecasting. They obtained satisfactory results and encouraged the use of continuity equation in modeling the sediment loads. The spatial and temporal factors were taken into account in the sediment transport process by continuity equation, which can shed light on the effect of upstream sediment loads. Thus, this method can build a relationship between upstream and downstream sediment loads, and is feasible and acclaimed in an SSL estimation model. It is preferred to completely physics-based approaches in which the detailed environmental data are generally not available and simplified assumptions are unrealistic (Kothyari et al. 1997; Kouassı et al. 2013).

Traditional multi-layer feedforward neural network (MFNN) has some drawbacks in its architecture and regularization. He (1993) proposed double parallel feedforward neural network (DPFNN), which involves a paratactic relationship between linear and nonlinear mapping. It is a parallel connection of a multi-layer feedforward neural network and a two-layer feedforward neural network. The multi-layer network uses its hidden nodes to adjust the solution and thus improves nonlinear mapping performance; and the two-layer network can give high learning speed for linear solution (He 1994). It was demonstrated that DPFNN has faster convergence speed and better generalization capability than MFNN (Zhong and Ding 2005; Wang et al. 2011). When using particle swarm optimization for feature selection, the DPFNN model could rectify over-fitting problem as well (Huang and He 2007). It has been used for hyperspectral data classification (He and Huang 2005), concentration estimation of gas mixture (Zhao et al. 2010) and water diversion demand estimate (Khan et al. 2014), which has been proved to be a promising method for regression and prediction.

The purpose of this paper is to develop a novel estimation model with a combination of fuzzy pattern-recognition, continuity equation and double feedforward neural network. In addition to river flows, the influence of sediment loads in the upstream river sections is investigated in this study. Two sediment stations on the Muddy Creek in Montana of USA are used as case study sites.

2 Description of Estimation Models

2.1 Multi-Layer Feedforward Neural Network (MFNN)

The three-layer feedforward neural network consisting of the input, hidden and output layers, is the most widely used MFNN model. The input layer introduces input data {p 1, p 2, …, p k } to the network. The weighted sum of inputs and bias are passed with a predetermined activation function f(.) to the nodes in the hidden layer (Thirumalaiah and Deo 1998):

$$ {t}_i=f\left({\displaystyle \sum_{j=1}^k{p}_j{w}_{ji}}+{b}_i\right) $$
(1)

where t i (i = 1, 2, …, s) represent nodes in the hidden layer and p j (j = 1, 2, …, k) represent nodes in the input layer. The weight parameter connecting the input layer and the hidden layer is denoted by w ji , and b i is the bias value. Similarly, the node in the output layer is computed from nodes in the hidden layers (Thirumalaiah and Deo 1998):

$$ y=F\left({\displaystyle \sum_{i=1}^s{t}_i}{\overline{w}}_i+\overline{b}\right) $$
(2)

in which y represents a single node in the output layer and F(.) is the activation function for the output layer. The weight parameters from the hidden layer to the output layer and bias are denoted by \( {\overline{w}}_i \) and \( \overline{b} \), respectively. For traditional MFNN models, the activation function f(.) is usually a radial basis function or sigmoid function, and F(.) is a linear function, respectively. They reveal relation of nodes between two layers, although having no physical meanings. The MFNN model for SSL estimation has limitations attributed to the negligence of sediment properties.

2.2 Double Parallel Feedforward Neural Network (DPFNN)

As can be seen in Fig. 1a, DPFNN model is developed from MFNN model in which two networks connect each other in parallel with the same k input nodes. For the three-layer neural network of DPFNN, the nodes in the hidden layer (t 1, t 2, …, t s ) are computed by Eq. (1) and then connected to the output with \( {\overline{w}}_i \) in the same manner. Analogously for the two-layer neural network, the weight parameters directly from the input layer to the output layer are denoted by v j (j = 1, 2, …, k). The node in the output layer is acquired in the following equation (Zhong and Ding 2005):

Fig. 1
figure 1

(a) Topological structure of MFNN and DPFNN models (b) Calibration process using DE algorithm for the estimation models (c) Topological structure of HFNN and HDFNN models

$$ y={\displaystyle \sum_{i=1}^s{t}_i}{\overline{w}}_i+{\displaystyle \sum_{j=1}^k{p}_j}{v}_j $$
(3)

That is, the output is a summary of two parallel neural networks. The procedure of computing y from its inputs is demonstrated in Fig. 1a, for the MFNN and DPFNN models respectively, whilst the calibration process for searching optimized parameters is shown in Fig. 1b. For a given set of training samples (p n, Y n) supplied to the model, the error function is defined as:

$$ E(W)=\sqrt{\frac{1}{N}{\displaystyle \sum_{n=1}^N{\left({y}^n\hbox{-} {Y}^n\right)}^2}} $$
(4)

where the vector W is a collection of all unknown parameters, and varies with the estimation model; y n and Y n are computed and desired output (n = 1,2,…, N), respectively, and N is the number of training samples. The objective of network training, hence, is to find W opt which satisfies that E(W opt ) = min E(W). As shown in Fig. 1b, the vector W is updated with the updated fitness value of E(W) and is finally outputted if stopping criteria is satisfied. In the present paper, differential evolution (DE) is employed as an optimization technique to find the minimum value of error function and the corresponding W opt . The DE is a widely used population-based optimization algorithm, which is favourable for searching parameters of non-differentiable and time-varying models (Storn and Price 1995; Rocca et al. 2011; Li et al. 2013). It conducts mutation, crossover and selection operations based on the differences of randomly sampled pairs of solutions in the population, thus avoids local optima and allows fast convergence, details of which can be found in Chen et al. (2015).

2.3 Hybrid Feedforward Neural Network (HFNN)

The above two models are incapable of distinguishing the influences of different inputs, thus {p 1, p 2, …, p k } is employed to represent any potential inputs for SSL estimation. In practice, some previous studies estimated sediment based on the river flow and sediment data at its own station (Aytek and Kisi 2008; Afan et al. 2014), while others focused on the estimation of downstream sediment data by using data from both upstream and downstream stations (Kisi 2004; Partal and Cigizoglu 2008). For the case of this study, river flows Q either at the upstream or downstream stations and SSL from upstream stations are involved as inputs. When fed with various inputs, the output SSL at the downstream station is obtained in different manners.

In this section, a hybrid feedforward neural network (HFNN) is developed with respect to river flow inputs (Q in1 Q in2  , …, Q in k ). A conceptual activation function based on fuzzy pattern-recognition is introduced as follows (Qiu et al. 1998):

$$ {Q}_i=\frac{1}{{\displaystyle \sum_{l=1}^C\frac{{\displaystyle \sum_{j=1}^k{\left[{w}_{ji}\left({Q}_j^{in}-{M}_i\right)\right]}^2}}{{\displaystyle \sum_{j=1}^k{\left[{w}_{ji}\left({Q}_j^{in}-{M}_l\right)\right]}^2}}}} $$
(5)

where Q i (i = 1, 2, …, s) are nodes in the hidden layer and Q in j (j = 1, 2, …, k) are nodes in the input layer. Model vector M = [M i ] = [M l ] contains a number of patterns in the hidden layer. It entertains fuzzy pattern-recognition idea in the hidden layer, since the inputs are classified into a number of categories in terms of different patterns. The parameter C refers to the number of elements in the model vector as well as the number of nodes in the hidden layer (i.e. C = s). Generally, a higher value of C generates a higher precision for the estimation result, since it implies that there are more categories in the hidden layer and represents a higher degree of nonlinearity. We further give a general expression for the vector M: if the number of the nodes in the hidden layer equals to C (≥2), then \( M=\left(1.0,\ \frac{C-2}{C-1},\ \frac{C-3}{C-1}, \dots,\ \frac{1}{C-1},0\right) \). The degree of membership is 1.0 for “wet” model in wet season and 0 for “dry” model in dry season, thus, the defined vector would fully cover the models ranging from “wet” to “dry” season. Meanwhile, the activation function from the hidden layer to the output layer is given as follows:

$$ SS{L}^{(1)}={a}_0\times {\left({\displaystyle \sum_{i=1}^s{Q}_i}{\overline{w}}_i+\overline{b}\right)}^{b_0} $$
(6)

where SSL (1) represents node in the output layer; \( {\overline{w}}_i \) and \( \overline{b} \) denote the weight parameters and bias for the output layer, respectively. The activation function in Eq. (6) expresses an exponential relationship between river flows and sediment loads, which is generally a functional relationship representing the SRC. Values of a 0 and b 0 for a specific river are to be optimized in the training process of neural network. The structure of HFNN model is depicted in the framework of Fig. 1c, where SSL (1) is considered as the final output with inputs (Q in1 Q in2  , …, Q in k ). Accordingly, HFNN model examines the relationship of Q and SSL by considering the fuzzy property of sediment loads in an MFNN structure.

2.4 Hybrid Double Feedforward Neural Network (HDFNN)

In this section, a hybrid double feedforward neural network (HDFNN) is developed when the sediment data at the upstream river stations are included as inputs. These sediment inputs directly work on the output in a two-layer neural network. In the representation of a river system, upstream stations are regarded as nodes in the input layer and downstream station as node in the output layer. Thus, mass conservation is satisfied over the river network by the following continuity equation (Li and Gu 2003):

$$ \frac{\partial SD}{\partial T}={\displaystyle \sum_{i=1}^h{v}_i}{Q}_i^s-{Q}^s $$
(7)

where SD and Q s are respectively the sediment deposition and sediment transport rate at the downstream station, and T is time. Meanwhile, Q s i is the sediment transport rate at each upstream station, wherein i (1, 2, …, h) refers to the index of each node in the input layer. The fraction of sediment from a node in the input layer entering into the node in the output layer is denoted by v i . In the physical point of view, Eq. (7) implies that the rate of change of sediment deposition in the current river section is determined by the difference with the source sediment transport rate at the upstream river reaches, which reveals the sediment mass conservation over the entire river system. After discretization, the SD at time T + ΔT is determined by the following equation:

$$ S{D}_{\left(T+\Delta T\right)}=S{D}_{(T)}+\left({\displaystyle \sum_{i=1}^h{v}_i}{Q_i^s}_{(T)}-{Q}_{(T)}^{\mathrm{s}}\right)\times \Delta T $$
(8)

Multiplying the sediment transport rate Q s by a time step ΔT produces a change in the mass during the time step, thus daily SSL could be denoted by the equation SSL = Q s × ΔT when ΔT = 1 day. Accordingly, Eq. (8) could be written as follows:

$$ S{D}_{\left(T+1\right)}=S{D}_{(T)}+\left({\displaystyle \sum_{i=1}^h{v}_i}SS{L_i}_{(T)}-SS{L}_{(T)}\right) $$
(9)

Equation (9) in its simplified form is given by:

$$ S{D}_{\left(T+1\right)}={\lambda}_{(T)}\times \left(S{D}_{(T)}+{P}_{(T)}\right) $$
(10)

wherein \( {\lambda}_{(T)}=1-\frac{SS{L}_{(T)}}{S{D}_{(T)}+{P}_{(T)}} \) and \( {P}_{(T)}={\displaystyle \sum_{i=1}^h{v}_iSS{L}_{i(T)}} \). Here λ could be regarded as a recession coefficient, which is assumed to be independent of time (Yang et al. 1998). An initial value of sediment deposition SD 0 is given in advance, and the value of SD at each time step could be computed from Eq. (10). The SSL in the output layer is evaluated as a nonlinear function of sediment deposition (Li and Gu 2003)

$$ SS{L}_{{}_{(T)}}^{(2)}=\frac{1}{1+ \exp \left[-\left(S{D}_{(T)}+{P}_{(T)}\right)\right]} $$
(11)

The HDFNN model adopts two separate neural networks with different influences of river flows and upstream sediment loads on downstream SSL, which is different from the DPFNN model using the same input variables in two parallel networks. This is tantamount to say that two neural networks with respect to (Q in1 Q in2  , …, Q in k ) and (SSL in1 SSL in2  , …, SSL in h ) are involved, as shown in Fig. 1c. The final output SSL is a summary of SSL (1) and SSL (2). Accordingly, HDFNN model allows for dealing with two separate inputs due to the double networks used. Besides, the inclusion of fuzzy pattern-recognition and continuity equation in the neural networks enables consideration of fuzzy and time-varying feature of sediment loads.

3 Study Area

The time series of daily river flow and suspended sediment data used in this study belong to two stations on the Muddy Creek near Vaughn in Montana, USA. The drainage areas at these sites are 730.377 km2 for the upstream station (station No. 06088300) and 813.256 km2 for the downstream station (station No. 06088500), as shown in Fig. 2. These two stations have been studied in several works (Browning et al. 2005; Kisi Ö and Fedakar 2014), which ensures the reliability of our collected data. The objective of this work is to estimate the suspended sediment loads at the downstream station (SSL d ) based on river flows either at the upstream or downstream station (Q u or Q d ) and sediment loads at the upstream station (SSL u ).

Fig. 2
figure 2

Locations of stations on the Muddy Creek near Vaughn in Montana, USA

The daily dataset was collated from US Geological Survey (USGS), covering a time period of 4 years from 1st January 1977 to 31st December 1980. The discharge and sediment data for the upstream and downstream stations are plotted in Fig. 3. It can be seen that there is a highly nonlinear relationship between discharge and sediment data for both stations. The presence of outliers is detected as well, particularly for the sediment data. In the downstream dataset, only four values above or near 40,000 ton/day are observed while the others are below 20,000 ton/day. These outliers of data may give difficulty to the estimation models.

Fig. 3
figure 3

Scatter plots of (a) upstream and (b) downstream data between sediment load and discharge

For the purpose of calibration and estimation, data of years 1977 and 1978 are chosen in the training period, whilst those of year 1980 are chosen in the testing period. The remaining data of year 1979 (around 25 % of the whole data) are used for validation, which is an indispensible process to avoid over-fitting. The statistical parameters of river flow and sediment data for the two stations are summarized in Table 1, in which X mean , X median , S X , X max and X min denote the mean, median, standard deviation, maximum and minimum, respectively. A noticeable difference between X mean and X median is detected for the sediment data, which provides supporting evidence for the existence of outliers. The high values of S X indicate the complexity of the sediment data, and this may have a negative effect on the estimation performance. Besides, the X min value in the training set is higher than that in the corresponding testing set, both for Q u and SSL u . This may cause extrapolation difficulties in estimation of low sediment values. In short, the sediment data to be estimated are irregular and ambiguous, and a model, which can fit the highly nonlinear relationship between SSL d and the inputs, is in urgent need.

Table 1 Summary of statistical parameters for data at two stations

4 Results and Discussion

In order to undertake the comparison of performances by different inputs and models, two evaluation criteria are employed in the present study, i.e. root mean square error (RMSE) and Nash-Sutcliffe efficiency coefficient (NSEC). They are determined by the following equations:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{N}{\displaystyle \sum_{i=1}^N{\left({Y}_i-{\widehat{Y}}_i\right)}^2}} $$
(12)
$$ \mathrm{NSEC}=1-{\displaystyle \sum_{i=1}^N{\left({Y}_i-{\widehat{Y}}_i\right)}^2/{\displaystyle \sum_{i=1}^N{\left({Y}_i-\overline{Y}\right)}^2}} $$
(13)

where Y i and Ŷ i are respectively observed and computed values (i = 1,2,…, N), \( \overline{Y} \) is the averaged observed data, and N is the number of observations. The RMSE represents the sample standard deviation of the differences between computed and observed values. The NSEC exhibits the relative magnitude of the residual variance compared to the observed data variance. Both of them are used to assess the predictive power of hydrological models. The RMSE could also be employed as the error function in the calibration period, while NSEC is not capable due to its sensitiveness to extreme values with large outliers. However, the NSEC statistics is effective when evaluating the performances of models with different sets of data since it is a relative criterion. As can be seen from the above two equations, lower value of RMSE and higher value of NSEC indicate a better performance of estimation model.

4.1 Selection of Input Vectors

Six input combinations are evaluated to estimate current downstream sediment load value SSL d(t). In all cases, SSL u(t-1) indicating a one-step ahead is adopted as the only sediment input. Since the travel time of flow from upstream to downstream river section is within 1 day, Q u(t) and Q d(t) for the current day as well as Q u(t-1) and Q d(t-1) for the previous one day are chosen to constitute the inputs. Table 2 provides the estimation performances of DPFNN and HDFNN models by six input combinations in the testing period. It is noted that the DPFNN model is relatively insensitive to inputs since the RMSE and NSEC values by different input combinations are comparable. The configuration of [Q u(t), Q d(t), Q d(t-1), SSL u(t-1)] is the most valid input for DPFNN model, yielding the smallest RMSE and highest NSEC values. HDFNN model has the best accuracy with input [Q u(t-1), Q u(t), Q d(t), Q d(t-1), SSL u(t-1)], where there is a 35.67 % reduction in RMSE and 3.58 % improvement in NSEC when compared with the case of input [Q u(t-1), Q d(t), SSL u(t-1)]. In general, the estimation models would attain better performance when more effective information is provided by the inputs. This explains the worse accuracy with input [Q u(t-1), Q d(t), SSL u(t-1)] for both DPFNN and HDFNN models. In addition, Q u(t-1) is a valid input variable in the HDFNN model since the potential discharge inputs are more likely to perform their efficiency when considered alone by combining fuzzy pattern-recognition. The above results also verify the importance of the upstream sediment and discharges to the downstream sediment in this particular study site.

Table 2 Performances by various input combinations for DPFNN and HDFNN models

4.2 Model Performances

To draw an effective comparison between four estimation models, performances in the training and testing periods with two input combinations [Q u(t-1), Q d(t), SSL u(t-1)] and [Q u(t-1), Q u(t), Q d(t), Q d(t-1), SSL u(t-1)] are presented in Table 3. The configurations of neural network for each model are provided as well. Take the cases for [Q u(t-1), Q d(t), SSL u(t-1)] as an example, (3,6,1) for MFNN model implies that there are 3 nodes in the input layer, 6 nodes in the hidden layer and 1 node in the output layer. For the DPFNN model, the structures for multi-layer and two-layer are respectively (3,5,1) and (3,1). HFNN model only considers the discharge inputs, thus input nodes are Q u(t-1) and Q d(t). The inputs in HDFNN model for multi-layer and two-layer are different: one contains discharge data (Q u(t-1) and Q d(t)) and the other one contains sediment data SSL u(t-1).

Table 3 Performances by various models with input [Q u(t-1), Q d(t), SSL u(t-1)] and [Q u(t-1), Q u(t), Q d(t), Q d(t-1), SSL u(t-1)]

It can be found in Table 3 for the case of [Q u(t-1), Q d(t), SSL u(t-1)] that DPFNN model demonstrates better generalization capability and estimation ability than MFNN model, as indicated by RMSE and NSEC values in both training and testing periods. This may attribute to its capacity of mapping both nonlinear and linear relationship with the double parallel networks. HFNN model is found to be superior to MFNN model as well, because it is capable of providing information about different patterns. However, the influence of upstream sediment loads is not included, which results in deficient estimation. HDFNN model draws the advantages of both DPFNN and HFNN models, hence, gives the best performances amongst the four models. When compared with MFNN model, there is a 49.47 % and 42.58 % reduction in RMSE value for the training and testing stages, respectively. The superiority of HDFNN over DPFNN and HFNN models is apparent, particularly for the NSEC values in the testing period.

The above conclusions can be strengthened by results in Table 3 and Fig. 4 with input combination [Q u(t-1), Q u(t), Q d(t), Q d(t-1), SSL u(t-1)]. MFNN model has a fairly high RMSE value (815.4227 ton/day), and shows inferior results due to its total “black-box” operation. The RMSE value of HDFNN model is respectively 55.50, 55.43 and 48.24 % lower than that of MFNN, DPFNN and HFNN model in the testing period. Meanwhile, the NSEC value attained by HDFNN model is 0.9762, which is superbly high to reveal the ability of HDFNN model for sediment estimation. It can also be observed that the improvement of DPFNN over MFNN model is not significant with input [Q u(t-1), Q u(t), Q d(t), Q d(t-1), SSL u(t-1)]. A possible explanation may be that the two-layer neural network in DPFNN is redundant and invalid when more input variables are included. The two-layer neural network in HDFNN model is not a “parallel” one as the DPFNN model, since its input is different from the multi-layer neural network. Therefore, the HDFNN model is still able to yield accurate results and capture the effective inputs in this study case.

Fig. 4
figure 4

The observed and estimated suspended sediments by (a) MFNN (b) DPFNN (c) HFNN (d) HDFNN models in the testing period with input [Q u(t-1), Q u(t), Q d(t), Q d(t-1), SSL u(t-1)]

The time series of observed and computed SSL as well as the scatter plots by various models are demonstrated in the left and right hand side of Fig. 4, respectively. It is observed that the high values estimated by the HDFNN model are closer to the observed values than the three benchmarking models. In particular, it perfectly fits the peak sediment load as exhibited in Fig. 4d, while the other three models underestimate the peak value. The data applied to scatter plots are below 500 ton/day, which take up around 83 % of all data and are used to present the performances of relatively medium and low values. As seen from the figures, HDFNN model estimates are less scattered in relation to the other three models. MFNN and HFNN over-estimate most of the values, whilst DPFNN could not model the observations lower than 50 ton/day. It can be concluded that the HDFNN model is more adequate than the others for SSL estimation since it can simulate the sediments characterized by fuzziness, nonlinearity and time variety.

4.3 Partitioning Analysis for Low, Medium and High Loads

The statistics RMSE and NSEC scale the mean squared error of estimation models, therefore particularly reflect the performance on high values. Thus the above discussions on evaluation criteria and plots of estimated data could not provide explicit performances on different intervals of values. To address this problem, partitioning analysis is undertaken with regard to the performances of four models in this study. It is performed by finding threshold values of dataset and partitioning the data into several intervals (Goyal 2014). Threshold values are determined based on the observed SSL data in the testing period. Median and mean of the dataset are considered as two threshold values. Values lower than median (30 ton/day) are regarded as a “low load”; values higher than mean (384.04 ton/day) as a “high load”; values higher than median and lower than mean as a “medium load”.

The RMSE statistics of four models with respect to low, medium and high loads are illustrated in Fig. 5. For the ‘low load’, HFNN model performs worse than its counterparts with a largest RMSE value due to the irrespective of upstream sediment data. HDFNN model is completely adequate in estimating low SSL values. Four models are able to mimic the ‘medium load’ with comparable performances, in which HDFNN model achieves the best result. The RMSE obtained by HDFNN model is much smaller than the other three for the ‘high load’. In overall, the performances of HDFNN model on low, medium and high loads are consistently excellent, which corroborates the use of this approach in SSL estimation. This can mainly be attributed to the introduction of continuity equation which reveals the time-varying characteristic of sediment loads.

Fig. 5
figure 5

RMSE statistics of MFNN, DPFNN, HFNN and HDFNN models of low, medium and high values with input [Q u(t-1), Q u(t), Q d(t), Q d(t-1), SSL u(t-1)]

5 Conclusions

This study is concerned with the application of HDFNN model for suspended sediment load estimation. The fuzzy and time-varying characteristics of the sediment data are considered in HDFNN model, while a structure of double neural networks is employed with respect to river discharge and sediment inputs. The estimates based on HDFNN models are compared with three models (i.e. MFNN, DPFNN and HFNN) with dataset from two stations on the Muddy Creek in Montana, USA. Results confirm the generalization and estimation ability of HDFNN model with the lowest RMSE and highest NSEC values. The high and peak observed values are estimated successfully by HDFNN model as well since it could address the highly nonlinear and fuzzy sediment data. Furthermore, it can perfectly fit the low and medium values, as indicated by partitioning analysis. Conclusions can be drawn that the HDFNN model provides a superior alternative for SSL estimation. It overcomes the drawback of physically meaningless representation of MFNN model, enjoys the advantages of DPFNN and HFNN models by double neural networks and fuzzization process, and includes the influence of upstream sediment loads. The present work is the first application of considering the physics embedded with the structure of double neural networks, for modeling suspended sediment series in the downstream river section. This idea for modeling SSL can be referred and extended for other hydrological models. Nevertheless, the stability of the proposed HDFNN model has not been verified by uncertainty analysis, which should be performed as a future work. Besides, the HDFNN model only considers the inputs of upstream discharges and sediment, and their corresponding physical mechanism. The influences of other inputs and hydrological process regarding the SSL can be explored additionally to improve the estimation model.