Hybrid models for suspended sediment prediction: optimized random forest and multi-layer perceptron through genetic algorithm and stochastic gradient descent methods

Samadianfard, Saeed; Kargar, Katayoun; Shadkani, Sadra; Hashemi, Sajjad; Abbaspour, Akram; Safari, Mir Jafar Sadegh

doi:10.1007/s00521-021-06550-1

Hybrid models for suspended sediment prediction: optimized random forest and multi-layer perceptron through genetic algorithm and stochastic gradient descent methods

Original Article
Published: 07 October 2021

Volume 34, pages 3033–3051, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Hybrid models for suspended sediment prediction: optimized random forest and multi-layer perceptron through genetic algorithm and stochastic gradient descent methods

Download PDF

Saeed Samadianfard ORCID: orcid.org/0000-0002-6876-7182¹,
Katayoun Kargar²,
Sadra Shadkani¹,
Sajjad Hashemi¹,
Akram Abbaspour¹ &
…
Mir Jafar Sadegh Safari³

757 Accesses
18 Citations
1 Altmetric
Explore all metrics

Abstract

Owing to the nonlinear and non-stationary nature of the suspended sediment transport in rivers, suspended sediment concentration (SSC) modeling is a challenging task in environmental engineering. Investigation of SSC is of paramount importance in river morphology and hydraulic structures operation. To this end, for SSC modeling, first random forest (RF) and multi-layer perceptron (MLP) standalone models were developed, and then, they were optimized with genetic algorithm (GA) and stochastic gradient descent (SGD) to develop GA-MLP, GA-RF, SGD-MLP, and SGD-RF hybrid models. Variety of input scenarios are implemented for SSC prediction to find the best input combination. The streamflow and SSC data collected from two stations of Minnesota and San Joaquin rivers, respectively, located at South Dakota and California are utilized in the current study. Accuracies of the developed models are examined by means of three performance criteria of correlation coefficient (CC), scattered index (SI), and Willmott’s index of agreement (WI). A significant promotion in accuracy of hybrid models has been seen in contrast to their standalone counterparts. As can be deduced from the results, GA-MLP-5 and GA-RF-5 models with CC of 0.950 and 0.944, SI of 0.290 and 0.308, and WI of 0.974 and 0.971, respectively, were found as best models for prediction of SSC at Minnesota river. The developed SGD-MLP-5 and SGD-RF-5 models with CC of 0.900 and 0.901, SI of 0.339 and 0.339, and WI of 0.945 and 0.946, respectively, gave accurate results at San Joaquin river. Through the application of SGD algorithm, the adaptive learning rate, epochs, rho, L1 and L2 were activated and presumed as 0.004, 10, 1, 0.000009 and 0, respectively. The ExpRectifier was considered as san activation operation due to its better efficiency in comparison with its alternatives for predicting SSC in SGD-MLP model. According to the results, the fifth scenario that incorporates SSC_t–1, SSC_t–2, Q_t, Q_t–1, and Q_t–2 were found superior for SSC modeling in the studied rivers. The recommended hybrid algorithms based on GA and SGD optimization algorithms are proposed as practical tools for solving complex environmental problems.

Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction

Article 30 October 2020

Developing ensemble models for estimating sediment loads for different times scales

Article 21 April 2023

Suspended sediment load prediction in river systems via shuffled frog-leaping algorithm and neural network

Article 18 June 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Linked to the hydrological and environmental evolutionary modeling, there exists a significant progress in suspended sediment transport modeling in recent years. Understanding the sediment transport process and modeling such a complex phenomenon are of importance in water resources management [29, 30]. The suspended sediment concentration (SSC) in the river is a crucial problem in environmental, hydraulic, and water resources engineering. Sediments change several features of river systems such as quality and health of water (transport pollutants), geography and navigability of river and channel [20, 44]. Sediments conveyed within the flow remain in suspension for a considerable length of river and time referred to the suspended load, and its prediction is a challenging task due to the effect of several hydrological and metrological parameters in a particular watershed [14, 47]. Conventionally, sediment rating curve method is widely applied for SSC computation. It shows the exponential relationship between the river discharge and SSC through a regression analysis. As an exponential regression equation is over-fitted on entire data set, it may generate poor results on alternative data set. Therefore, a precise modeling approach is needed to solve such a complex problem [27]. Modeling of the sediment transport in rivers considering theoretical equations and mathematical models needs a wide range of data, but due to the lack of such an extensive data range, these models do not provide accurate estimations [47, 48].

Recently, numerous studies have been conducted for sediment transport modeling applying variety of machine learning methods. For example, Rajaee et al. [33] applied four methods of neuro-fuzzy (NF), artificial neural networks (ANNs) and multi-liner regression (MLR) to simulate the daily SSC in the USA. Based on the outcomes of this study, both ANN and NF models generated high performance in predicting SSC. Cobaner et al. [8] developed adaptive neuro-fuzzy model for SSC computation considering streamflow, rainfall, and suspended sediment data. The outperformance of adaptive neuro-fuzzy model is reported in contrast to the different types of ANN such as generalized regression neural network (GRNN), radial basis function neural network (RBNN) and multi-layer perceptron (MLP). Altunkaynak [2] and Zhang et al. [50] applied genetic algorithm (GA) and found out that its better accuracy in comparison with other approaches. Using neural differential evolution (NDE), NF and ANN and sediment rating curve (SRC) methods, Kisi [20] modeled daily SSC and found out that the NDE outperforms the other techniques in daily SSC estimation. Comparing the performance of linear genetic programming (LGP) to adaptive neuro-fuzzy inference system (ANFIS) and SRC methods for SSC estimation, Kisi and Guven [21] reported that the LGP model provides more accurate results than the other mentioned models. Singh and Chakrapani [43] examined the capability of feed-forward backpropagation (FFBP) ANN method for simulating the SSC considering rainfall, temperature, and discharge as model parameters. It was determined that increasing the number of input parameters improves the developed model’s accuracy. The classification and regression tree (CART), RBNN, MLR, ANN, M5 model tree, and least square support vector regression (LSSVR) were used to estimate the suspended sediment at a basin of rive in India by Kumar et al. [23]. It was found that both ANN and LSSVR models generated accurate results.

Studies mentioned above applied standalone machine learning algorithms for SSC modeling. Due to the complexity of the problem and deficiencies of the standalone models in terms of adjusting the variety of model parameters, hybrid algorithms can be reliable approaches for modeling the SSC in rivers. As examples from the literature, because of the complication of the relationship of SSC and streamflow (Q), the wavelet-artificial neural network model (W-ANN) was utilized in predicting the SSC factor in the Kuye River by the Liu et al. [24]. They decomposed the daily time series of SSC and Q into subseries at different scales as inputs for the model. It was pointed out that the W-ANN model had the best performance, which has higher forecasting precision than other models like SRC and ANN. ANFIS-FCM (fuzzy c-means clustering model) was suggested by Kisi and Zounemat-Kermani [22] to estimate the SSC in the USA. Based on the obtained results, the ANFIS-FCM gave better results than other models, including ANFIS, ANN, and SRC. Zounemat-Kermani et al. [52] utilized the data of hydrometric stations that placed in different part of the USA, such as Arkansas, Delaware and Idaho to assess four different models of support vector regression (SVR) and three different ANN methods in daily SSC prediction/approximation and compared to that of MLR and SRC methods. Statistical parameters indicated the superior performance of SVR and ANN models in comparison with the traditional methods. Malik et al. [26] reported the outperformance of co-active neuro-fuzzy inference system (C-ANFIS) technique in comparison with the MLR, multiple nonlinear regression (MNLR), MLP and SRC. Ghose and Samantaray [16] used ANN-FFBP and regression approaches together with their hybridized versions as GA-BPNN and GA-regression for the same purpose at various basins of the Suktel River to realize their sensitivity at regional scale. Liu et al. [25] modeled SSC time series for Kuye River watershed, China using hybrid ensemble empirical mode decomposition (EEMD-ANN), EEMD-MLR, ANN, and MLR methods. The performance of EEMD-ANN and EEMD-MLR models were improved by a factor of 52.9% and 41.0%, in contrast to the ANN and MLR methods.

Advanced hybrid machine leaning algorithms were implemented for river suspended sediment estimations as an evolutionary hydrological modeling. For instance, [40] appraised impression of a hybrid model merging support vector machine (SVM) with whale optimization algorithm (SVM-WOA) and compared to SVM-PSO and conventional SVM and RBFN models for estimating SSC at Sundargarh and Salebhata stations in Mahanadi River, India. The results showed that SVM-WOA accomplished superiorly in comparison with SVM-PSO, SVM and RBFN models for five various input scenarios. Roushangar et al. [36] modeled SSC and river discharge of two stations of Mississippi river and improved performance of the implemented models using wavelet transform (WT) and ensemble empirical mode decomposition (EEMD) approaches. Results indicated that data processing with WT was more suitable than the EEMD in enhancement of the models' performance. Data processing improved the models' performance by a factor of up to 15%. It was found that using the merged kernel extreme learning machine (KELM) method, the previous stations data could be applied successfully for SSC and river discharge modeling when the stations' own data were not available. Dang [10] improved coupled models of discrete wavelet transform (DWT) with ANFIS, named DWT-ANFIS, and principal component analysis (PCA) with ANFIS, named PCA-ANFIS, for SSC simulation. The merged and single ANFIS models were trained and tested utilizing long-term daily SSC and river discharge which were measured on the Schuylkill and Iowa Rivers in the United States. The results revealed that the PCA-ANFIS accomplished better than the single ANFIS and the coupled DWT-ANFIS. Mehri et al. [28] utilized four intelligent methods of ANFIS-PSO, ANFIS-GA, ANFIS, and group method of data handling (GMDH) to estimate the sediment concentration distribution. Since both GA and PSO optimization approaches were utilized to improve the ANFIS model, the efficiency of these models was improved. The results showed that the performance of the ANFIS-PSO was better than ANFIS-GA, ANFIS, and GMDH models for estimation of suspended sediment distribution.

While the outperformance of RF and MLP has been already reported in the relevant literature, this study is designed to enhance the RF and MLP models’ accuracy for SSC modeling through hybridization with GA and SGD to develop novel MLP-GA, RF-GA, MLP-SGD and RF-SGD models. The weakness in accurate estimation of suspended sediment using conventional regression formulas was already reported in the literature. To this end, in this study, an attempt was made to make an accurate estimation of the suspended sediment concentration using efficient methods such as MLP and RF and to improve the results using GA and SGD methods. Through the modeling, historical SSC and river discharge values of two stations were used for the modeling. Still, to the best of our knowledge, hybrid SGD-MLP and SGD-RF models have not been used for SSC estimation.

2 Materials and methods

2.1 Study area

Daily discharge (Q) and SSC data of the Minnesota and San Joaquin Rivers (Fig. 1) for the time period of 2000–2017 were acquired from the United States Geological Survey (USGS). The Minnesota River is one of the Mississippi Rivers tributaries having almost 534 km long, in the USA state of Minnesota. The station name is Minnesota with station number of 05,325,000, basin area of 35,065 km², latitude of 37° 40′34″ and longitude of 121° 15′55″. The studied river in Joaquin River is located at central California. The correspondent USGS station name is San Joaquin with station number of 11,303,500, basin area of 38,590 km², latitude of 44° 10′08'' and longitude of 94° 00′11″. Table 1 shows the statistical characteristics of SSC and Q parameters in both Minnesota and San Joaquin stations. It can be seen from Table 1 that SSC and Q parameters presented skewed distributions.

Table 1 Statistical features of the applied data

Full size table

2.2 Multi-layer perceptron (MLP)

Multi-layer perceptron (MLP) neural network is considered as the most common neural networks for the nonlinear fitting with higher accuracy. In order to acquire that performance with high precision, this method selects an appropriate number of neurons and layers at its structure. The training process is used to find the suitable amount of weight for the connections between neurons. The backpropagation network is a common algorithm for the feed-forward neural network in which the output of each layer is considered as the input of the next one [13]. The MLP involves three layers of input, hidden, and output layers. In order to train the MLP neural networks, several algorithms are applied and the selection of each one can affect its accuracy and the learning rate of the network [6]. Figure 2 illustrates the flowchart of the MLP model. In this process, the number of the hidden layers and neurons of each hidden layer should be designated in a way to model give high performance. In structure of MLP, Levenberg–Marquardt (LM) algorithm is mostly utilized for calculation of output signals. Variety of hidden layers can be adopted for MLP structure design; however, one hidden layer provides satisfactory results in hydrological problems [38].

2.3 Random forest (RF)

The random forest (RF) is an ensemble learning method proposed by Breiman [5]. In order to acquire better generalization abilities, ensemble learning builds multiple base learners or combines several trees at its structure [31, 37]. Among various rule generation approaches, a RF method is an effective and practical approach. The RF uses the algorithms of the decision tree (classification and regression tree (CART)) as the base algorithm. It is notable that the RF is a more robust approach in comparison with other decision tree ensembles [9]. The RF method has ability in defining the appropriate predictor and re-scaling the data like other techniques is not required. The conventional regression tree has weak performance due to its tendency to over-fit on the training data set. The RF method uses randomness characteristics to overcome this problem [42]. In this method, each decision tree is made up of a bootstrap sample from the calibration dataset, which comprises about two-thirds of the sample. The rest of the elements are considered as out-of-bag data. Variables are designated in random and based on the lowest Gini index and then, the best split is selected. According to the repetition at each bootstrap, the value for out-of-bag data is acquired. In this process, the repetition in each tree is continued until reaching the stop condition defined previously. In RF method, parameters are optimized by the usage of mean square error (MSE) and calibration dataset: confidence, number of trees, minimal leaf size, maximal depth, minimal size for split, subset ratio, and number of prepruning alternatives [12]. A RF method is effectively utilized for the broad datasets analysis. Figure 3 illustrates the structure of the RF method.

2.4 Genetic algorithm (GA)

Holland [18] and Goldberg [17] developed the genetic algorithm (GA). GA is a powerful method for the exploratory development of large-scale hybrid optimization problems. GA encodes the problem as a set of strands that contain fine particles, and then, it changes the strands to stimulate the process of gradual evolution. Compared to the local search algorithms, in public search where there is only one acceptable solution, GAs consider a community of individuals. Working with a group of people makes it possible to study the main structures and characteristics of different people, which leads to the identification and discovery of more efficient solutions [15]. In this study, GA selects the most relevant disciplines and eliminates those disciplines that are less relevant to the study population. Each member of the population, which is an approximation of the final answer, is coded as strings of letters or mergers. These strands are called chromosomes. The most common mode is the display with the digits zero and one. Other modes of using three digits, real numbers and integers are also used. The values on the chromosomes alone do not have a specific meaning, but must be decoded and have meaning and result only as decision variables. It should be noted that the search process is performed on encrypted information unless it is from genes with real values. Once the chromosomes have been encoded, the efficiency or fit of each member of the population can be calculated. Fit is a relative scale that indicates the suitability of individuals to produce the next generation. In the nature, fit is equivalent to one's ability to survive. The objective function plays a decisive role in determining the fit of individuals. During reproduction, the fit of each individual is determined with the help of the primary information obtained from the objective function. These values are used in the selection process to lead it to select the right people. The higher the fit of the individual in relation to the population has a more chance to be elected. The lower the relative fit, the less likely it is to be selected for the next-generation production. The act of replication in a GA is used to exchange genetic information between a pair or more individuals. The simplest type of multiplication is the intersection of a point. After this step, a mutation operator is likely to be applied to the generated strands. In mutation, each individual alone can change according to the laws of probability. Mutation means changing the value of one of the string cells from zero to one or from one to zero. After the amplification and mutation steps, the chromosomes are decoded and the value of the target function is calculated, then a fit is assigned. If necessary, the selection and reproduction steps, etc., are performed again. During this process, the average efficiency of the response population is expected to increase. The algorithm ends when a specific goal is met. For example, if a certain number of generations are created, the amount of merit of people reaches a certain amount, or a certain point is reached in the search space [4].

2.5 Stochastic gradient descent (SGD)

Stochastic gradient descent (SGD) is an iteration-based method for developing a derivative function called a target function, which is a stochastic approximation of the gradient descent (GD) method. In fact, the SGD is an algorithm to obtain the minimum value of a function in several iterative loops and the values for which the function takes its minimum value. The difference between a SGD and a standard GD is that, unlike the standard GD, which uses all training data to develop the target function, the SGD uses a randomly selected set of training data for optimization. This method has many applications in statistical and machine learning problems [34].

In the machine learning application, a problem usually appears in which, it is important to determine a function such as f of statistical data with one or more parameters and then, to define these parameters in such a way that the sum (or average) of the amounts of the function f for each piece of data statistically cause the minimum possible amount. It is assumed that there is a set of statistical data where the function f is determined only on the basis of the parameter θ, in which case by giving the i data from the data set to the function f a function of $\theta $ is obtained that called ${\vartheta }_{\mathrm{i}}(\theta )$. Now, the problem is streamlined to find a $\theta $ that minimizes the following expression:

$$ \vartheta \left( {\varvec{\theta}} \right) = \left( {\frac{1}{{\varvec{n}}}} \right)\mathop \sum \limits_{{{\varvec{i}} = 1}}^{{\varvec{n}}} \vartheta_{{\mathbf{i}}} \left( {\varvec{\theta}} \right) = {\varvec{E}}\left| {\vartheta_{{\mathbf{i}}} \left( {\varvec{\theta}} \right)} \right| $$

(1)

where $\boldsymbol{\vartheta }\left({\varvec{\theta}}\right)$ is a target function. In many cases, the target function becomes a simple function on which the application of the SGD method is not intricate and time consuming. In these cases, the standard GD is used, such as the family of exponential functions of a parameter used to appraise economic functions. However, since the standard or stochastic GD method requires the calculation of the objective function gradient in each loop, in some cases where the target function parameters are large or the training data set is very large, the calculation accomplished in each loop can be very time consuming and intricate. For this reason, a SGD is used, which in each loop accomplish this operation only for a section of the training data set that we have. In the SGD method, in each loop, the desired operation is not performed on only one member of the training data set that is randomly selected in each loop, and instead is performed on a subset of it where there are two reasons for this:

1.
Dispersion reduces the amount acquired for the parameter in each loop and convergence is more stable.
2.
Utilizing matrix operations that have a very fast execution.

2.6 GA-MLP

Determining the number of neurons in hidden layers, training cycles, learning rate, momentum, error epsilon, and local random seed is one of the complicated modeling procedures in the MLP method. For this purpose, a hybrid algorithm including GA and MLP was utilized for modeling the SSC in rivers. This process started with the selection of a random initial population in which each individual consists of various numbers of hidden layer neurons. Then, the elite population with the best individuals is selected. The model is run repeatedly, and for each individual, the function is calculated and the obtained functions are stored. In the last stage, if the termination criteria had satisfaction results, the individual with the best function is saved. Otherwise, this process continues to find an appropriate population with a new function. The Levenberg–Marquardt algorithm is used mostly in the training stage of this process, but it has a random nature. By the usage of GA, the model is protected against this problem and chooses the best transfer function for the hidden and output layers. Figure 4 displays the flowchart presenting the GA-MLP method.

2.7 GA-RF

To promote the RF model’s productivity and accuracy, the optimization of key parameters in the RF model structure is a necessary task [35]. To achieve higher performance, Zhou et al. [51] applied a novel approach where a few number of learners were utilized with high-quality. Based on [1], trees have different percentages in traditional RF’s precision, for example, some of them can make incorrect predictions and underestimate the performance and efficiency of the model. Different strategies are used to increase the model’s accuracy, including a general climbing strategy, a greedy algorithm and so on; however, they have some deficiencies. For instance, using the greedy method to promote the performance of the RF model can give rise to becoming confined at local optima. Therefore, GA is implemented to solve that problem by choosing the best subset of features that is able to improve the RF model performance. Consequently, the RF model that is optimized by the GA has high accuracy in comparison with the traditional one. Figure 5 shows the flowchart illustrating the GA-RF method.

2.8 SGD implementation on MLP and RF

In the general implementation of a SGD, $\theta $ is the vector that includes all the parameters of the cost function. Firstly, $\theta $ is set to the ideal vector. Then, for each update of this vector, a member of the training data set is randomly selected, and at the $\boldsymbol{\alpha }$ rate, the vector of the cost function gradient at point $\theta $ is subtracted from $\theta $:

$$ \theta = \theta - \alpha \nabla_{{\uptheta }} \vartheta_{{\text{i}}} \left( {\theta ;x^{{\left( {\text{i}} \right)}} ,y^{{\left( {\text{i}} \right)}} } \right) $$

(2)

where $\vartheta $ is a cost function and $({x}^{\left(\mathrm{i}\right)},{y}^{(\mathrm{i})})$ is a randomly selected member of the training data, and ${\vartheta }_{\mathrm{i}}(\theta ;{x}^{\left(\mathrm{i}\right)},{y}^{(\mathrm{i})})$ denotes the i sentence of the objective function. $\boldsymbol{\alpha }$ is the rate at which ${\varvec{\theta}}$ is updated and has an experimental value that prolongs convergence if it is too small, and convergence may not happen if it is too large [45].

In another implementation, in each loop, a random member of the data set is not selected, but in each loop, the all data set is randomly rearranged once, and then the upgrade operation is accomplished in order of ${\vartheta }_{1}$,${\vartheta }_{2}$,…,${\vartheta }_{\mathrm{n}}$ where n shows the size of the training dataset. The following pseudocode indicates this implementation:

1.
Give $\theta $ and $\boldsymbol{\alpha }$ the input value
2.
Repeat until the minimum is reached
3.
Randomly retrieve training data
4.
Repeat for i from 1 to n: $\theta = \theta - \alpha {\nabla }_{\theta }{\vartheta }_{\mathrm{i}}(\theta ;{x}^{\left(\mathrm{i}\right)},{y}^{(\mathrm{i})})$\

Usually the update operation is not performed for $\vartheta $ from a single member of the training data set, but for a subset of this data called a small set. Figure 6 shows how GD works for the single-input and dual-input function.

SGD algorithm has advanced facilities including epochs, rho, L₁ or L₂ adjustment, momentum training, adaptive learning rate and rate annealing that enable high prediction precision in modeling by both MLP and RF models. In addition to optimizing MLP results using SGD, the network contains many hidden layers containing neurons with tanh (hyperbolic tangent function), rectifier (where x is the input value, select the maximum of (0, x)), max out (select the maximum input vector coordinates), and ExpRectifier (exponential rectifier function) activation operations.

The size of the weight updates is described by the user determined learning rate when adaptive learning rate is inactivated and is a function of the difference between the predicted value and the target value. That variance commonly named delta, which is only presented at the output layer. Backpropagation is applied to accurate estimation of the output at each hidden layer. The momentum is ramped up slowly since redundant momentum can result in oscillation. The rate annealing, momentum training, dropout rate annealing, and momentum training parameters activate if the adaptive learning rate is disabled.

2.9 Performance criteria

In this study, three performance indexes including correlation coefficient (CC), Willmott’s index of agreement (WI), and scattered index (SI) are utilized in order to measure the model’s accuracy. The mathematical description of CC, SI, and WI can be expressed, respectively, as follows:

$$ CC = \frac{{\left( {\sum\nolimits_{i = 1}^{n} {O_{{\text{i}}} P_{{\text{i}}} } - \frac{1}{n}\sum_{i = 1}^{n} {O_{{\text{i}}} } \sum_{i = 1}^{n} {P_{{\text{i}}} } } \right)}}{{\left( {\sum\nolimits_{i = 1}^{n} {O_{{\text{i}}}^{2} } - \frac{1}{n}\left( {\sum_{i = 1}^{n} {O_{{\text{i}}} } } \right)^{2} } \right)\left( {\sum_{i = 1}^{n} {P_{{\text{i}}}^{2} } - \frac{1}{n}\left( {\sum_{i = 1}^{n} {P_{{\text{i}}} } } \right)^{2} } \right)}} $$

(3)

$$ {\text{SI}} = \frac{{\sqrt {\frac{1}{n}\sum_{i = 1}^{n} {\left( {P_{{\text{i}}} - O_{{\text{i}}} } \right)^{2} } } }}{{\overline{O} }} $$

(4)

$$ {\text{WI}} = 1 - \left[ {\frac{{\sum_{i = 1}^{n} {\left( {O_{{\text{i}}} - P_{{\text{i}}} } \right)^{2} } }}{{\sum_{i = 1}^{n} {\left( {\left| {P_{{\text{i}}} - \overline{O}_{{\text{i}}} } \right| + \left| {O_{{\text{i}}} - \overline{O}_{{\text{i}}} } \right|} \right)^{2} } }}} \right] $$

(5)

in which P_i is predicted and O_i is the observed ith value and n is the number of data. CC is a statistical tool to determine the type and degree of relationship of one quantitative variable with another quantitative variable. CC is one of the criteria used to determine the correlation between two variables. The CC indicates the intensity of the relationship as well as the type of relationship (direct or inverse). This coefficient is between 1 and -1, and if there is no relationship between the two variables, it is equal to zero. If the range of data used is large, the amount of root mean square error (RMSE) in the modeling error evaluation section will also be high, which does not indicate that the model is inaccurate. For this purpose, in this study, the SI index was used, which is the result of dividing the RMSE index by the average test data. The closer the SI value to zero shows that the model is more accurate. WI is also one of the standardized indicators for calculating the model prediction error, the value of which is between zero and one. WI = 1 indicates the highest agreement and WI = 0 indicates no agreement. This index is also highly sensitive to limit values due to the use of difference squares [49]. Furthermore, Taylor diagrams were implemented for additional investigation of utilized hybrid GA-MLP, GA-RF, SGD-MLP, and SGD-RF models performances in SSC estimation [46].

3 Result and discussion

In the current study, MLP, RF, GA-MLP, GA-RF, SGD-MLP, and SGD-RF models with different input combinations were utilized for estimating the SSC in two stations and their results are examined in terms of accurate estimation. Moreover, due to the fact that there is not any direct way for dividing the entire data to train and test data sets in data driven methods, different proportions were utilized in the literature, e.g., Choubin [7] implemented a total of 63% of their data for training, whereas Qasem et al. [32] and Kargar et al. [19] used 67% of data, Dodangeh et al. [11], Asadi et al. [3], Shabani et al. [41] and Samadianfard et al. [39] utilized 70%, and Zounemat-Kermani et al. [53] exploited 80% of whole data for model development. Therefore, for the model development is this study, data were split into training (70%) and testing (30%). Accordingly, the time period of 2000–2012 was used to train the models and the 2013–2017 data were implemented as the test data set.

Table 2 provides the input parameters for each model where the SSC and Q parameters are shown in the current time (SSC_t and Q_t) and also with the previous daily lag times. It can be seen from Table 2 that six different scenarios were considered for SSC estimation utilizing different input combinations of SSC_t-1, SSC_t-2, SSC_t-3, Q_t, Q_t-1, Q_t-2, Q_t–3 parameters. It is worthy to note that the selected scenarios are considered based on auto correlation of SSC and Q variables. For predicting SSC through MLP and RF and optimizing with SGD algorithm at Minnesota and San Joaquin rivers, the adaptive learning rate due to better performance was activated and presumed equal to 0.004. Moreover, the amounts of epochs, rho, L1 and L2 were presumed as 10, 1, 0.000009 and 0, respectively. In addition, the ExpRectifier was considered as activation operation because of its better efficiency in contrast to other activation operations for predicting SSC in SGD-MLP model. Additionally, Tables 3and4 show the default and optimized parameters used in RF and GA-RF models development in estimating the SSC for two different stations, including Minnesota and San Joaquin, respectively. Similarly, the related parameters of the MLP and GA-MLP models are displayed in Tables 5and6 for the mentioned stations.

Table 2 Implemented models and their input parameters

Full size table

Table 3 Parameters of the RF and GA-RF models (Minnesota station)

Full size table

Table 4 Parameters of the RF and GA-RF models (for San Joaquin station)

Full size table

Table 5 Parameters of the MLP and GA-MLP models (for Minnesota station)

Full size table

Table 6 Parameters of the MLP and GA-MLP models (for San Joaquin station)

Full size table

Table 7 gives results of the RF, GA-RF, and SGD-RF models in Minnesota station. Diverse combinations are considered for these models, and the accurate estimations with high performance were obtained from RF-5 with CC of 0.938, SI of 0.325, and WI of 0.968 among standalone RF models and, GA-RF-5 and SGD-RF-4 with CC of 0.944 and 0.943, SI of 0.308, and WI of 0.971 among the hybrid RF ones. It is noticeable that the GA-RF-5 has a high accuracy in comparison with the RF-5 model and GA improved the model by reducing 5.2% of the SI parameter. In the GA-RF-5, number of trees is 27, maximal depth of 27, the confidence of 0.383, minimal leaf size of 55, minimal size for split of 51, number of prepruning alternatives of 77 and subset ratio of 0.9009 that are shown in Table 3. Evident is that the GA plays a vital role as an optimizer in SSC estimation. In Minnesota station, based on the results in Table 8, the MLP-4 has CC of 0.948, SI of 0.296, and WI of 0.973, and the GA-MLP-5 model has CC of 0.950, SI of 0.290, and WI of 0.974. These two models provide more accurate results in contrast to other models. Due to the presence of the GA, the GA-MLP-5 has more accurate predictions and GA enhanced the model’s precision. As it is shown in Table 8, GA decreases the SI parameter by a factor of 2%. The GA-MLP-5 model has 77 training cycle, 0.1825 learning rate with the momentum of 0.0774, 0 error epsilon, and with a local random speed of 77 (Table 5). Also, it should be noted from Table 8 that although SGD had positive effects on increasing the prediction accuracy of the standalone MLP model, but in comparison with GA, it showed lower capability in reducing the prediction errors. By and large, in the Minnesota station, the GA-MLP-5 has more accurate performance in comparison with other optimized models. Also, among the SGD-MLP and SGD-RF models, obtained results indicated that SGD-MLP-6 with CC of 0.948, SI of 0.299, and WI of 0.973 and SGD-RF-5 with CC of 0.943, SI of 0.308, and WI of 0.971 presented more accurate performances.

Table 7 General results of the computations for the RF, GA-RF and SGD-RF models (Minnesota station)

Full size table

Table 8 General results of the computations for the MLP, GA-MLP and SGD-MLP models (Minnesota station)

Full size table

In Table 9, the results of the RF, GA-RF, and SGD-RF models in San Joaquin station are displayed. The RF-4 with CC of 0.884, SI of 0.366, and WI of 0.938 is considered as the best one among various combinations of RF, and the GA-RF-4 with CC of 0.892, SI of 0.353, and WI of 0.942 has the highest performance among other GA-RF models. GA-RF-4 has 27 trees with a maximal depth of 27, the confidence of 0.3826, minimal leaf size of 55, minimal size for split of 51, number of prepruning alternatives of 0 and subset ratio of 0.9009 (Table 4). According to Table 9, the GA and SGD improved the precision of all standalone models. In the San Joaquin station, SGD-MLP-4 and SGD-RF-5 have the best performance with CC of 0.901, SI of 0.335 and 0.339, WI of 0.941 and 0.941, respectively (Table 10). In this station, SGD has reduced the SI parameter by factors of 6.9% and 8.6% in contrast to the RF-5 and MLP-4 models, respectively.

Table 9 General results of the computations for the RF, GA-RF, and SGD-RF models (San Joaquin station)

Full size table

Table 10 General results of the computations for the MLP, GA-MLP and SGD-MLP models (San Joaquin station)

Full size table

As it is seen from Figs. 7 and 8, GA-RF-5 and GA-MLP-6 models provided more accurate results for SSC estimation in comparison with other models in the Minnesota station. Furthermore, in the San Joaquin station, SGD-RF-5 and SGD-MLP-4 models illustrated better performances. In both Minnesota and San Joaquin stations, hybrid optimized models gave accurate results in predicting the SSC values, whereas other models state poor outcomes. Based on the scatter plots presented at Figs. 9 and 10, in two studied stations, the most accurate MLP, RF, GA-MLP, GA-RF, SGD-MLP, and SGD-RF models are shown. Overall, the combination of input parameters has no significant effects on their outcomes. For instance, in Minnesota station, the GA-MLP-5 with SSC_t-1, SSC_t-2, Q_t, Q_t-1, Q_t–2 as input parameters and in San Joaquin station, SGD-MLP-4 model with inputs of SSC_t-1, Q_t, Q_t–1 were considered as the best models.

Additionally, Taylor diagrams are utilized in order to scrutinize standard deviation and correlation values between observed and estimated SSC. The RF-5, GA-RF-5, SGD-RF-5, MLP-4, GA-MLP-5 and SGD-MLP-6 models for the Minnesota station and RF-4, GA-RF-4, SGD-RF-5, MLP-4, GA-MLP-4 and SGD-MLP-4 models for San Joaquin station are displayed in Fig. 11. The length of the space from the green point (a reference point) to each point described as centered root mean square error (RMSE). Consequently, the minimum interval between the green point and the correspondent point shows the most precise model [46]. Pursuant to Fig. 11, in the Minnesota station, the red point (GA-MLP-5) is the nearest point to the reference point, and also, the light blue point (SGD-MLP-4) has the least distance from the green point at the San Joaquin station, thus providing the best estimates for the SSC.

Sediment transport has complicated process, and estimation of SSC is a quite difficult issue as a fundamental hydrological problem. Numerous conventional regression alternatives are available in the literature; however, their applicability on rivers in different climate conditions is a challenging task. In recent decade, runoff-suspended sediment load modeling attracts interests in implementation of machine learning algorithms to develop robust models with high computational ability. So, this study focused on the applicability of MLP and RF; hybridized with GA and SGD methods for SSC prediction. Variety of scenarios were implemented for the modeling to find the superior combination from historical records. It was found that SSC_t-1, Q_t, Q_t-1 and SSC_t-1, SSC_t–2, Q_t, Q_t-1, Q_t–2, combinations provided more accurate results which means that one and two days ahead records could be joined the upcoming day’s SSC value. The results obtained in this study showed that GA-MLP, GA-RF, SGD-MLP, and SGD-RF models successfully estimated SSC in two different rivers. Extension of the present study may be considered as the application of the suggested algorithms in other rivers with different climate conditions. In the current research, GA and SGD algorithms as a metaheuristic algorithms were implemented for optimization of the MLP and RF models. Future studies may consider application of the alternative optimization algorithms for SSC modeling.

4 Conclusions

The sedimentation problem is an essential issue in hydrological sciences, due to its imperative role in river hydraulic. In the current research, MLP and RF methods and their hybrid forms with GA and SGD are proposed to estimate SSC at of Minnesota and San Joaquin rivers. For each method, different combinations as input parameters were implemented for the modeling. The performance of models was examined in order to discover the best one in this process. The results demonstrated that the accuracy of standalone models was increased using GA and SGD optimized models. Overall results indicated that GA-MLP and SGD-MLP were found to be the robust techniques for modeling SSC relying on the statistical results obtained based on various indexes, including SI, CC, and WI. Moreover, assessing the precision of models in estimating SSC revealed that the standalone model's superiority in predicting SSC was less than their hybrid counterparts. Conclusively, these models can be used in water resource management and alternative fields of engineering with a high degree of confidentiality. Additional to the streamflow and suspended sediment load variables, incorporating the further hydrological parameters for the modeling seems to be worthy for improving the model credibility.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97
Article Google Scholar
Altunkaynak A (2009) Sediment load prediction by genetic algorithms. Adv Eng Softw 40(9):928–934
Article MATH Google Scholar
Asadi E, Isazadeh M, Samadianfard S, Ramli MF, Mosavi A, Nabipour N, Shamshirband S, Hajnal E, Chau KW (2020) Groundwater quality assessment for sustainable drinking and irrigation. Sustainability 12:177
Article Google Scholar
Bäck T, Fogel DB, Michalewicz Z (2000) Evolutionary computation 1: Basic algorithms and operators. Institute of Physics Pub, Bristol
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Chong EK, Zak SH (2013) An introduction to optimization. Wiley, NY
MATH Google Scholar
Choubin B (2020) Spatial hazard assessment of the PM10 using machine learning models in Barcelona Spain. Sci Total Environ 701:134474
Article Google Scholar
Cobaner M, Unal B, Kisi O (2009) Suspended sediment concentration estimation by an adaptive neuro-fuzzy and neural network approaches using hydro-meteorological data. J Hydrol 367:52–61
Article Google Scholar
Cutler A, Cutler DR, Stevens JR (2011) Random forests. In: Ensemble Machine Learning, pp 157–176
Dang MN (2021) Integration of ANFIS with PCA and DWT for daily suspended sediment concentration prediction. Water SA 47:200–209
Google Scholar
Dodangeh E, Choubin B, Eigdir AN (2019) Integrated machine learning methods with resampling algorithms for flood susceptibility prediction. Sci Total Environ 9:135983
Google Scholar
Douglas RK, Nawar S, Alamar MC, Mouazen AM, Coulon F (2018) Rapid prediction of total petroleum hydrocarbons concentration in contaminated soil using vis-NIR spectroscopy and regression techniques. Sci Total Environ 616–617:147–155
Article Google Scholar
Du KL, Swamy MN (2006) Neural networks in a soft computing framework. Springer Science & Business Media, Berlin
Google Scholar
Frings RM, Kleinhans MG (2008) Complex variations in sediment transport at three large river bifurcations during discharge waves in the river Rhine. Sedimentology 55:1145–1171
Article Google Scholar
Gallagher K, Sambridge M (1994) Genetic algorithms: a powerful tool for largescale nonlinear optimization problems. Comput Geosci 20(7):1229–1236
Article Google Scholar
Ghose D, Samantaray S (2018) Modelling sediment concentration using back propagation neural network and regression coupled with genetic algorithm. Procedia Comput Sci 125:85–92
Article Google Scholar
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc.
Holland JH (1992) Genetic algorithms. Sci Am 267:66–72
Article Google Scholar
Kargar K, Samadianfard S, Parsa J, Nabipour N, Shamshirband S, Mosavi A, Chau KW (2020) Estimating longitudinal dispersion coefficient in natural streams using empirical models and machine learning algorithms. Eng Appl Comput Fluid 14(1):311–322
Google Scholar
Kisi O (2010) River suspended sediment concentration modeling using a neural differential evolution approach. J Hydrol 389:227–235
Article Google Scholar
Kisi O, Guven A (2010) A machine code-based genetic programming for suspended sediment concentration estimation. Adv Eng Softw 41:939–945
Article MATH Google Scholar
Kisi O, Zounemat-Kermani M (2016) Suspended sediment modeling using neurofuzzy embedded fuzzy c-means clustering technique. Water Resour Manage 30:3979–3994
Article Google Scholar
Kumar D, Pandey A, Sharma N, Flügel WA (2016) Daily suspended sediment simulation using machine learning approach. CATENA 138:77–90
Article Google Scholar
Liu QJ, Shi ZH, Fang NF, Zhu HD, Ai L (2013) Modeling the daily suspended sediment concentration in a hyperconcentrated river on the Loess Plateau, China, using the Wavelet–ANN approach. Geomorphology 186:181–190
Article Google Scholar
Liu QJ, Zhang HY, Gao KT, Xu B, Wu JZ, Fang NF (2019) Time-frequency analysis and simulation of the watershed suspended sediment concentration based on the Hilbert-Huang transform (HHT) and artificial neural network (ANN) methods: A case study in the Loess Plateau of China. CATENA 179:107–118
Article Google Scholar
Malik A, Kumar A, Piri J (2017) Daily suspended sediment concentration simulation using hydrological data of Pranhita River Basin, India. Comput Electron Agric 138:20–28
Article Google Scholar
McBean EA, Al-Nassri S (1988) Uncertainty in suspended sediment transport curves. J Hydrol Eng, ASCE 114(1):63–74
Article Google Scholar
Mehri Y, Nasrabadi M, Omid MH (2021) Prediction of suspended sediment distributions using data mining algorithms. Ain Shams Engineering Journal
Meshram SG, Safari MJS, Khosravi K, Meshram C (2021) Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction. Environ Sci Pollut Res 28(9):11637–11649
Article Google Scholar
Mohammadi B, Guan Y, Moazenzadeh R, Safari MJS (2021) Implementation of hybrid particle swarm optimization-differential evolution algorithms coupled with multi-layer perceptron for suspended sediment load estimation. CATENA 198:105024
Article Google Scholar
Prasad AM, Iverson LR, Andy L (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2):181–199
Article Google Scholar
Qasem SN, Samadianfard S, Sadri Nahand H, Mosavi A, Shamshirband S, Chau KW (2019) Estimating daily dew point temperature using machine learning algorithms. Water 11:582
Article Google Scholar
Rajaee T, Mirbagheri SA, Zounemat-Kermani M, Nourani V (2009) Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models. Sci Total Environ 407:4916–4927
Article Google Scholar
Robbins H, Monro S (1951) A Stochastic Approximation Method. Ann Math Stat 22:400–407
Article MathSciNet MATH Google Scholar
Rodriguez-Galiano V, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez J (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Article Google Scholar
Roushangar K, Aghajani N, Ghasempour R, Alizadeh F (2021) The potential of ensemble WT-EEMD-kernel extreme learning machine techniques for prediction suspended sediment concentration in successive points of a river. J Hydroinf 23:655–670
Article Google Scholar
Safari MJS (2020) Hybridization of multivariate adaptive regression splines and random forest models with an empirical equation for sediment deposition prediction in open channel flow. J Hydrol 590:125392
Article Google Scholar
Safari MJS, Aksoy H, Mohammadi M (2016) Artificial neural network and regression models for flow velocity at sediment incipient deposition. J Hydrol 541:1420–1429
Article Google Scholar
Samadianfard S, Hashemi S, Kargar K, Izadyar M, Mostafaeipour A, Mosavi A, Nabipour N, Shamshirband S (2020) Wind speed prediction using a hybrid model of the multi-layer perceptron and whale optimization algorithm. Energy Rep 6:1147–1159
Article Google Scholar
Samantaray S, Sahoo A (2021) Prediction of suspended sediment concentration using hybrid SVM-WOA approaches. Geocarto International.
Shabani S, Samadianfard S, Sattari MT, Mosavi A, Shamshirband S, Kmet T, Várkonyi-Kóczy AR (2020) Modeling pan evaporation using gaussian process regression K-nearest neighbors random forest and support vector machines. Comparative Anal Atmos 11:66
Google Scholar
Shirzad A, Safari MJS (2019) Pipe failure rate prediction in water distribution networks using multivariate adaptive regression splines and random forest techniques. Urban Water J 16(9):653–661
Article Google Scholar
Singh N, Chakrapani GJ (2015) ANN modelling of sediment concentration in the dynamic glacial environment of Gangotri in Himalaya. Environ Monit Assess 187(8):494
Article Google Scholar
Sivakumar B, Jayawardena AW (2002) An investigation of the presence of low-dimensional chaotic behaviour in the sediment transport phenomenon. Hydrol Sci J 47:37–41
Article Google Scholar
Taddy M (2019) Business data science: Combining machine learning and economics to optimize, automate, and accelerate business decisions. McGraw-Hill, New York
Google Scholar
Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res: Atmos 106:7183–7192
Article Google Scholar
Verstraeten G, Poesen J (2001) Factors controlling sediment yield from small intensively cultivated catchments in a temperate humid climate. Geomorphology 40:123–144
Article Google Scholar
Ward P, Balen RT, Verstraeten G, Renssen H, Vandenberghe J (2009) The impact of land use and climate change on late Holocene and future suspended sediment yield of the Meuse catchment. Geomorphology 103:389–400
Article Google Scholar
Willmott CJ (1982) Some comments on the evaluation of model performance. Bull Am Meteor Soc 63:1309–1313
Article Google Scholar
Zhang FX, Wai OWH, Jiang YW (2010) Prediction of sediment transportation indeep bay (Hong Kong) using genetic algorithm. J Hydrodyn, Ser B 22(5):599–604
Google Scholar
Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137:239–263
Article MathSciNet MATH Google Scholar
Zounemat-Kermani M, Kisi O, Adamowski J, Ramezani-Charmahineh A (2016) Evaluation of data driven models for river suspended sediment concentration modeling. J Hydrol 535:457–472
Article Google Scholar
Zounemat-Kermani M, Seo Y, Kim S, Ghorbani MA, Samadianfard S, Naghshara S, Kim NW, Singh VP (2019) Can decomposition approaches always enhance soft computing models? Predicting the dissolved oxygen concentration in the St. Johns River, Florida. Applied Sciences, 9:2534.

Download references

Funding

Not Applicable.

Author information

Authors and Affiliations

Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
Saeed Samadianfard, Sadra Shadkani, Sajjad Hashemi & Akram Abbaspour
Department of Civil Engineering, Ryerson University, Toronto, ON, Canada
Katayoun Kargar
Department of Civil Engineering, Yaşar University, Izmir, Turkey
Mir Jafar Sadegh Safari

Authors

Saeed Samadianfard
View author publications
You can also search for this author in PubMed Google Scholar
Katayoun Kargar
View author publications
You can also search for this author in PubMed Google Scholar
Sadra Shadkani
View author publications
You can also search for this author in PubMed Google Scholar
Sajjad Hashemi
View author publications
You can also search for this author in PubMed Google Scholar
Akram Abbaspour
View author publications
You can also search for this author in PubMed Google Scholar
Mir Jafar Sadegh Safari
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The author contributions are listed as follows: (1) Conceptualization: Saeed Samadianfard, Mir Jafar Sadegh Safari, (2) Data curation: Sadra Shadkani, Sajjad Hashemi, (3) Formal analysis: Saeed Samadianfard, Katayoun Kargar, Sadra Shadkani, (4) Investigation: Saeed Samadianfard, Katayoun Kargar, Akram Abbaspour, (5) Methodology: Katayoun Kargar, Sadra Shadkani, Sajjad Hashemi, (6) Resources: Saeed Samadianfard, Akram Abbaspour, Mir Jafar Sadegh Safari, (7) Software: Sadra Shadkani, Sajjad Hashemi, (8) Supervision: Saeed Samadianfard, Akram Abbaspour, Mir Jafar Sadegh Safari, (9) Validation: Saeed Samadianfard, Mir Jafar Sadegh Safari, (10) Visualization: Saeed Samadianfard, Sadra Shadkani, Sajjad Hashemi, (11) Writing—original draft: Saeed Samadianfard, Katayoun Kargar, (12) Writing—review & editing: Saeed Samadianfard, Mir Jafar Sadegh Safari

Corresponding author

Correspondence to Saeed Samadianfard.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Samadianfard, S., Kargar, K., Shadkani, S. et al. Hybrid models for suspended sediment prediction: optimized random forest and multi-layer perceptron through genetic algorithm and stochastic gradient descent methods. Neural Comput & Applic 34, 3033–3051 (2022). https://doi.org/10.1007/s00521-021-06550-1

Download citation

Received: 17 January 2021
Accepted: 14 September 2021
Published: 07 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00521-021-06550-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hybrid models for suspended sediment prediction: optimized random forest and multi-layer perceptron through genetic algorithm and stochastic gradient descent methods

Abstract

Similar content being viewed by others

Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction

Developing ensemble models for estimating sediment loads for different times scales

Suspended sediment load prediction in river systems via shuffled frog-leaping algorithm and neural network

1 Introduction