Introduction

The high velocity jet over dam spillways can create deep scour holes and riverbed degradation downstream of the spillways in alluvial beds (Roushangar et al. 2016; Sattar et al. 2017). Scouring is the removal of sediments around the hydraulic structure in an alluvial stream. Often, it is the structure that changes the flow pattern around it in a way that it reinforces sediment transport and thereby initiates scouring (Aghaee-Shalmani and Hakimzadeh 2015) where it can eventually cause damage and destroy hydraulic structures (Ebtehaj et al. 2018; Regazzoni and Marot 2011). Grade-control structures are structures made of sand, stone, wood, concrete, or other materials with the purpose of limiting erosion in riverbeds and downstream of dam spillways (Guven and Gunal 2008). These structures are commonly used to manage flood streams and prevent bed erosion and sediment transport (Galia et al. 2016). Therefore, local scour modeling is an important topic in river hydrodynamics, preventing river bed degradation and protecting the integrity of grade-control structures (Laucelli and Giustolisi 2011; Riahi-Madvar et al. 2019a, b).

During the past five decades, multiple traditional and statistical equations have been developed based on experimental and field data to predict the scour depth (Hooshyaripor et al. 2014; Riahi-Madvar et al. 2019a). Although empirical equations can be used easily, they often overestimate or underestimate the scour depth in validation phase (Roushangar et al. 2016). Hence, several soft computing models were developed using effective parameters including upstream water head, weir height, tail water level, bed particle grain-size distribution, and particle density (Roushangar et al. 2016; Sattar et al. 2017; Eghbalzadeh et al. 2018; Sharafati et al. 2019). Ebtehaj et al. (2015) applied group method of data handling (GMDH) to predict the discharge coefficient of spillways. Mehri et al. (2019) also used GMDH to predict the discharge coefficient of a piano key weir. Najafzadeh et al. (2015) used this method to estimate the scour depth around bridge abutments. In these studies, the superiority of GMDH over artificial neural networks (ANNs) and nonlinear regression equations was confirmed.

Gene expression programming (GEP) is another type of soft computing models frequently applied in hydraulic engineering (Guven and Gunal 2008; Najafzadeh and Barani 2011; Azamathulla 2012; Alavi et al. 2011; Moussa 2013; Mesbahi et al. 2016; Riahi-Madvar et al. 2019b and 2021b).

Least-squares support vector machine (LSSVM) is another form of expert system models that have been extensively used for predicting the scour depth of various hydraulic structures (Etemad-Shahidi and Ghaemi 2011; Pal et al. 2011; Chou and Pham 2014, 2017; Najafzadeh et al. 2016; Pourzangbar et al. 2017; Goel and Pal 2009; Barzegar et al. 2019; Chaucharda et al. 2004; Samui and Kothari 2011; Lu et al. 2016; Wang et al. 2017; Han et al. 2019; Hoang 2019; Seifi and Riahi 2020).

Among ANN-based models, the extreme learning machine (ELM) is a novel model (Nourani et al. 2017) which has been employed in different sciences due to high performance and a creative design (Huang et al. 2012; Sahoo et al. 2013; Abdullah et al. 2015; Yaseen et al. 2016; Ebtehaj et al. 2018). Applying ELM model can reduce the training time using the single-layer feed-forward neural networks and random selection of parameters (Huang et al. 2004; Liang et al. 2006; Zhao et al. 2018; Imani et al. 2018; Yousif et al. 2019).

Reviewing the recent studies on predicting the scour depth states a focus on artificial intelligence (AI) evaluations on training and testing datasets. The rigorous study by Scurlock et al. (2012) declared that in most AI approaches, the data used in model developments were not satisfactory to provide reliable predictions of the scour depth, and model errors could reach 300%. Limited training and test datasets can restrict the cross-validation of GCS scour depth prediction model results during the model development process causing a failure to capture the complex relationship between the key factors.

Another major disadvantage of these models originates from the selection of input parameters. The scour depth downstream of GCSs can be predicted based on different parameters including the height of the structure, depth of flow, grain size, tailwater depth and etc. (D’Agostino and Ferro 2004). Pan et al. (2013) showed the importance of the flow energy in the scouring process and its applicability for the scour depth prediction. Regazzoni and Marot (2011) formulated a new erosion resistance index using energy exchange between fluid and sediment. Pagliara et al. (2015) reported that the energy method is reasonably accurate for predicting the state of scour. Based on previous studies survey, the energy-based approach has not yet been integrated with AI models to estimate scour depth downstream of GCSs. Also, the reliability and resilience analysis of developed models over validation data has not been performed.

Another novel contribution in scour modeling downstream of the GCSs is the uncertainty analysis of AI model results. All engineering design processes have uncertainty, and it is not possible to accurately estimate the values of model parameters (Seifi et al. 2020a; Riahi-Madvar and Seifi 2018). Hence, the uncertainty is considered as a part of the design in reliability analysis (Johnson 1992; Alimohammadi et al., 2019 and 2020). The scale differences and lack of data measurement in real-scale analysis can cause uncertainty (D’Agostino and Ferro 2004; Azmathullah et al. 2005; Seifi et al. 2020b). The scour equations formulated in previous studies were simple, and they did not investigate the uncertainty of parameters (Khalid et al. 2019; Muzzammil and Siddiqui 2009). It was assumed that all model parameters were known as certain quantities, while the hydraulic parameters such as depth and flow are stochastic in nature, and equations based on these parameters are also stochastic and uncertain. Therefore, the scour phenomenon and models have uncertainty, and a probabilistic method is required to analyze the uncertainty of model results. So far, the scouring process has not been accurately modeled due to the existing uncertainty of the parameters, unknown physics of the phenomenon, and measurement errors (Yanmaz and Cicekdag 2001; Galia et al. 2016; Lenzi and Comiti 2003). The Monte Carlo method has been extensively used in different study areas to investigate the uncertainty of different AI models (Seifi et al. 2020a, b a,b; Riahi-Madvar et al. 2011; Brandimarte et al. 2006; Sharafati et al. 2020; Gholami et al. 2018).

The present study aims to apply a new approach to improve major shortcomings of scour modeling at GCSs including biased input variables, explicit AI-based predictive equations, and lack of quantification of uncertainty, reliability, and resilience of models developed in previous studies. The current study discusses the effect of energy as a reasonable input variable, a parameter rarely considered in the prediction of the scour state downstream of GCSs. Furthermore, the uncertainty, reliability, and resilience analysis of developed models for GCSs scour prediction are other major drawbacks of the traditional models that are addressed here. To achieve these goals, the performance of GEP, ELM, LSSVM, and GMDH models developed to predict the scour depth downstream of GCSs is compared using the existing field and laboratory data with an energy-based approach, and validation data are used to evaluate the results.

Materials and methods

Datasets

Development of expert system models was performed using 318 datasets, including 276 laboratory datasets (Veronese 1937; Bormann and Julien 1991; D’Agostino 1994; Mossa 1998; Lenzi et al. 2000; Ben Meftah and Mossa 2020) and 42 the Missiga stream field data (data pertaining to (Falciai and Giacomin 1978; D’Agostino and Ferro 2004)) (Table 1). The 13 datasets of Lenzi et al. (2000) were used for cross-validating, reliability, and resilience analysis of the developed models over validation data. From the first dataset (318 datasets), 75% (229 datasets; randomly chosen) were used to train the models and 25% (76 datasets) were used to test the models.

Table 1 laboratory and field data sets used in this study

Identification of influencing parameters on the scour depth

Reviewing previous studies implies that the maximum scour depth of a grade-control structure caused by the erosive power of flow is a function of geometry, hydraulic, and bed properties as expressed in the following functional form.

$$\mathrm s=\mathrm f(\mathrm Z,\mathrm B,\mathrm b,\mathrm H,\mathrm h,\mathrm Q,{\mathrm \rho}_{\mathrm s},\mathrm \rho,\mathrm g,{\mathrm d}_{90},{\mathrm d}_{50})$$
(1)

where s denotes the scour depth, Z is the height of the grade control structure, B is the width of the structure, b is the width of the spillway, H is the difference between the depth of water in the upstream and downstream, h is the depth at the downstream, Q is the water discharge, ρs is the sediment density, ρ is the water density, g is the gravitational acceleration, d90 is the diameter of the particle larger than 90% of the weight of all particles, and d50 is the diameter of the particle larger than 50% of the weight of all particles (Zheng et al. 2021). The box plots of the effective parameters on scour depth, based on the collected datasets in this study, are presented in Fig. 1. The box plot summarizes the distribution of the measured parameters and indicates the skewness, center, spread, and the presence of extreme outlier values in the used datasets.

Fig. 1
figure 1

Box plot of parameters affecting scour according to the collected data sets: a dimensional; b dimensionless

Previous studies have shown that using dimensionless parameters produces better results than the dimensional parameters (Azamathulla et al. 2005; Guven 2011; Najafzadeh et al. 2014; Mesbahi et al. 2016). Therefore, the Buckingham π theorem is used to obtain dimensionless parameters (Zadehmohamad and Bolouri Bazaz 2017) affecting the scour depth. Using Z, Q, and ρ as main repetitious parameters, the dimensionless parameters of the phenomena can be derived as follows:

$$\pi_1=\frac sZ, \pi_2=\frac bZ, \pi_3=\frac BZ, \pi_4=\frac hZ, \pi_5=\frac HZ, \pi_6=\frac{\rho_s}\rho, \pi_7=\frac{{gZ}^5}{Q^2}, \pi_8=\frac{d_{50}}Z, \pi_9=\frac{d_{90}}Z$$
(2)

By combining the above parameters, and rearranging the dimensionless numbers, new parameters can be obtained as follows:

$$\frac{{\mathrm\pi}_8}{{\mathrm\pi}_9}=\frac{{\mathrm d}_{90}}{{\mathrm d}_{50}},\frac{{\mathrm\pi}_2}{{\mathrm\pi}_3}=\frac{\mathrm b}{\mathrm B},\frac{{\mathrm\pi}_4}{{\mathrm\pi}_5}=\frac{\mathrm h}{\mathrm H}$$
(3)

D’Agostino and Ferro (2004), Najafzadeh (2015) used incomplete self-similarity theory and reported that the dimensionless parameter A50 had a significant effect on scour process. This parameter is defined as follows:

$${A}_{50}=\frac{1}{{\pi }_{2}}{\left(\frac{1}{{(\pi }_{6}-\frac{\rho }{\rho }){\pi }_{7}{\pi }_{8}}\right)}^\frac{1}{2}=\frac{Q}{b Z{\left[g{d}_{50 }(\frac{{\rho }_{s-\rho }}{\rho })\right]}^\frac{1}{2}}$$
(4)

and the functional form of scour depth is rewritten as:

$$\frac{s}{Z}=f(\frac{b}{B}, \frac{h}{H}, \frac{{d}_{90}}{{d}_{50}}, {A}_{50}, \frac{b}{Z})$$
(5)

As discussed before, Pagliara et al. (2015) and Pan et al. (2013) used the energy parameter in scour analysis. Therefore, here, the Buckingham π theorem is used to extract the parameter. \({\pi }_{10}=\frac{E}{Z}\) The flow energy (E) at GCS is defined by Eq. (6):

$$E=z+{h}_{0}+\frac{{u}^{2}}{2g}, u=\frac{Q}{B{h}_{0}}$$
(6)

In which, the \({h}_{0}=H+h-Z\), and u is the flow velocity.

The scenario-based approach for the selection of input vectors

In order to investigate the effects of different input parameter configurations and evaluate the sensitivity of results of the developed expert system models to these configurations, a scenario-based approach is applied for the selection of input vectors (Seyedian et al. 2014). The seven scenarios listed in Fig. 2 are created, and a parameter is eliminated in each scenario in order to investigate the effect of various input parameters on the model output. In the first scenario, all parameters in Eq. 5 except energy, including \(\frac{b}{B}, \frac{h}{H}, \frac{{d}_{90}}{{d}_{50}}, {A}_{50}, \frac{b}{Z}\), are used as input vector to the models for estimating the scour depth, and \(\frac{s}{Z}\) is the output. In the second scenario, the parameter \(\frac{b}{B}\) is eliminated from the first scenario, and this procedure is repeated for all of the scenarios. As discussed, energy is a significant parameter in scour studies. Given the importance of this parameter, scenario 7 is developed combining the first scenario and the energy parameter as a new dimensionless parameter. To this end, four expert system models of GEP, GMDH, ELM, and LSSVM were trained, tested, and validated using these seven scenarios.

Fig. 2
figure 2

Input parameters in the seven defined scenarios

The soft computing models

GEP

Genetic programming (GP) was first introduced by Cramer (1985), expanded by Koza (1992), and developed by Ferreira (Ferreira 2001). Gene Expression Programming (GEP) is a variant of the genetic algorithm (GA) and belongs to the family of evolutionary algorithms based on Darwin theory. The main difference between these algorithms is the nature of the population members. In GA, the population members are strings (chromosomes) of a fixed length. In GP, entities with different shapes and sizes are non-linear individuals for parse trees, and in GEP, individuals are encoded as a linear string of fixed length expressed as non-linear entities of different shapes and sizes (Guven and Gunal 2008). In general, GEP is an automated technique for finding the solutions to a problem through computer programming. Unlike other methods, it can automatically select the inputs having the greatest impact on the model.

GMDH

The multi-objective neural network form of group method of data handling (GMDH) was developed as a multivariate analysis method for capturing, modeling, and predicting the behavior of complex systems without detailed expert knowledge about their mechanisms (Ivakhnenko 1971). The main purpose of this algorithm is to predict the structure of a complex model based on inherent knowledge of the phenomena represented in the data, rather than the trial–error and priority of the user (Farlow 1981). The relationship of input and output data can be estimated using the Volterra-Kolmogorov-Gabor series (Ivakhnenko 1971) as follows:

$${\mathrm Y}_{\mathrm i}=\mathrm a+{\textstyle\sum_{\mathrm i=1}^{\mathrm M}}{\mathrm b}_{\mathrm i}{\mathrm x}_{\mathrm i}+{\textstyle\sum_{\mathrm i=1}^{\mathrm M}}{\textstyle\sum_{\mathrm j=1}^{\mathrm M}}{\mathrm C}_{\mathrm{ij}}{\mathrm x}_{\mathrm i}{\mathrm x}_{\mathrm j}+\dots{\textstyle\sum_{\mathrm i=1}^{\mathrm M}}{\textstyle\sum_{\mathrm j=1}^{\mathrm M}}\dots{\textstyle\sum_{\mathrm k=1}^{\mathrm M}}{\mathrm d}_{\mathrm{ij}\dots\mathrm k}{\mathrm x}_{\mathrm i}{\mathrm x}_{\mathrm j}\dots{\mathrm x}_{\mathrm k}$$
(7)

where Yi represents the output, xi, xj, …, xk are the inputs, a, bi, cij, dijk are the polynomial coefficients, and M is the number of independent variables (Najafzadeh et al. 2015; Dargahi-Zarandi et al. 2017).

ELM

Extreme learning machine (ELM) is a fast-learning algorithm with a high-level generalization function belonging to the family of single-layer feed-forward neural networks (SLFN), which was first introduced by Huang et al. (2006; 2012). The learning process in ELM is faster, more generalizable, robust, and accurate than the traditional algorithms such as back-propagation–based artificial neural network (BPANN) (Huang et al. 2004; Imani et al. 2018; Mahmoud et al. 2018). The SLFN function defines as (Huang et al. 2006; Liang et al. 2006):

$$\begin{array}{cc}{\mathrm f}_{\mathrm L}\left(\mathrm x\right)={\textstyle\sum_{\mathrm i=1}^{\mathrm L}}\;{\mathrm\beta}_{\mathrm i}\mathrm G\left({\mathrm a}_{\mathrm i},{\mathrm b}_{\mathrm i},\mathrm x\right),&\;\mathrm x\in\mathrm R^{\mathrm n},\;{\mathrm a}_{\mathrm i}\in\mathrm R^{\mathrm n}\end{array}$$
(8)

where ai and bi are ELM learning parameters,\(G\left({a}_{i},{b}_{i},x\right)\) is the output of the ith node based on the input x, and \({\beta }_{i}\) is the weight matrix relating the ith hidden node to the output node. The additive hidden node with G (x): R → R as the activation function (e.g., radial basis) is presented as follows (Huang et al. 2006):

$$\mathrm G\left(\mathrm a,\mathrm b,\mathrm x\right)=\exp\;\left(-\left(\mathrm a.\mathrm x+\mathrm b\right)\right)$$
(9)

For N optional samples, xi = n × 1 and ti = m × 1 respectively show the input and output vectors, where \(\left({x}_{i},{t}_{i}\right) \epsilon {R}^{n}\times {R}^{m}\). An SLFN with L hidden nodes estimates the N samples with a negligible error as follows:

$$\begin{array}{cc}{\mathrm f}_{\mathrm L}\;\left({\mathrm x}_{\mathrm j}\right)={\textstyle\sum_{\mathrm i=1}^{\mathrm L}}\;{\mathrm\beta}_{\mathrm i}\mathrm G\left({\mathrm a}_{\mathrm i}{\mathrm x}_{\mathrm j}+{\mathrm b}_{\mathrm i}\right)&\mathrm j=1,2,\dots,\mathrm N\end{array}$$
(10)

LSSVM

The least-squares support vector machine (LSSVM) is a variant of support vector machine (SVM) proposed by Vapnik (1995), which was developed based on the statistical learning theory and the structural risk minimization in conjunction with the least-squares error minimization (Suykens et al. 2002). As xi and yi respectively show the input and output datasets with \(i=\mathrm{1,2},3,\dots n, {x}_{I},{ y}_{i}\in {R}^{N}\), the non-linear LSSVM function is expressed as follows:

$$\mathrm f\left(\mathrm x\right)=\mathrm W^{\mathrm T}\varnothing\left(\mathrm X\right)+\mathrm b$$
(11)

where W represents the weight vector, \(\varnothing \left(X\right)\) is a function mapping X to an infinite dimensional feature vector, and b shows the bias term. The regression function in LSSVM can be expressed as an optimization problem:

$$\mathrm{Minimize}:\mathrm J\;\left(\mathrm w,\mathrm e\right)=\frac12\mathrm w^{\mathrm T}\mathrm w+\frac{\mathrm\gamma}2\sum_{\mathrm i=1}^{\mathrm N}\mathrm e_{\mathrm i}^2$$
(12)

With the constraints as

$$\begin{array}{cc}{\mathrm y}_{\mathrm i}=\mathrm w^{\mathrm T}\varnothing\left({\mathrm x}_{\mathrm i}\right)+\mathrm b+{\mathrm e}_{\mathrm i}&(\mathrm i=1,2,\dots,\mathrm N)\end{array}$$
(13)

where \(\gamma\) is a regularization parameter used for adjusting the penalties for errors, and ei shows the regression error (Chaucharda et al. 2004; Barzegar et al. 2019; Cimen 2008; Misra et al. 2009; Kumar and Kar 2009; Kakaei Lafdani et al. 2013; Nourani et al. 2015; Zounemat-Kermani et al. 2016). The parameters used in the LSSVM model are presented in Table 2.

Table 2 Parameters defined for GEP, ELM, GMDH, and LSSVM models

Existing equations predicting the scour depth

The most well-known existing equations for scour depth prediction are presented in Table 3. Six equations are regression-based, and the other two are AI-based. D’Agostino and Ferro (2004) proposed an equation for predicting the scour depth using experimental and field data using regression analysis. Laucelli and Giustolisi (2011) developed 14 equations using 312 experimental and field data applying multi-objective evolutionary paradigms where the two most accurate equations are provided in Table 3. Guven (2011) proposed two equations using the multi-output descriptive neural network (DNN) and regression analysis. He used Bormann and Julien (1991) 82 datasets derived from large-scale experiments to train and test his model. In these equations, \(\mathrm{F}\_\mathrm{rd}\) is defined as \({F}_{rd}=\frac{q}{{\left[bZ\times \frac{\left({\rho }_{s}-\rho \right)}{{\rho }_{s}}\times g{d}_{50}\right]}^{0.5}}\).

Table 3 Scour depth relationships downstream of GCS

Sattar et al. (2017) developed three GEP models using 265 large-scale datasets, and their best equations are used for comparison in this study. Ben Meftah and Mossa (2020) performed 32 experiments and proposed Eq. (21) for predicting the scour depth, where λ represents the downstream face angle of the GCS, and \({Fr}_{sd}=\frac{q}{{h}_{0}\sqrt{\left(\frac{{\rho }_{s}-\rho }{\rho }\right)g{d}_{50}}}\).

The developed Monte-Carlo framework for uncertainty, reliability, and resiliency analysis

In this study, to evaluate the uncertainty in the AI model results, the models are hybridized with the non-parametric Monte Carlo simulation (MCS). Based on the MCS results, the probability distribution of scour dimension is determined and applied for uncertainty quantification considering the Hsc as a structure of the simulator model for scour hole of GCS, the gs as the input vector of parameters with the initial value of gso, and Uq as the main source of uncertainty generation of AI models. The scour depth (S) can be derived in the following functional form (Riahi-Madvar et al. 2021a):

$$\mathrm S=\widehat{\mathrm S}+\mathrm\varepsilon=\mathrm{Hsc}\left({\mathrm g}_{\mathrm s},{\mathrm g}_{\mathrm{so}},\mathrm{Uq}\right)+\mathrm\varepsilon$$
(22)

where \(\varepsilon\) is the error defined as the difference between the observed scour depth (S) and the AI model predicted scour depth (\(\widehat{S}\)). In this way, the AI model uncertainties resulting from AI model regulatory parameters, architecture, and data clustering in the training phase can be written as

$$\begin{array}{cc}{\widehat{\mathrm S}}_{\mathrm i}=\mathrm{Hsc}\left({\mathrm g}_{\mathrm s},{\mathrm{Uq}}_{\mathrm i}\right)&\mathrm i=1,2,3,\dots.,\mathrm n\end{array}$$
(23)

where \(U{q}_{i}\) is the adaptive parameters of AI model, \({\widehat{S}}_{i}\) is the calculated scour depth by AI model in the ith run of MCS, and n is the number of MCS in AI model training. The results of AI model over the MCS are quantified by the prediction interval (PI) as:

$$\mathrm P\left({\widehat{\mathrm S}}_{\mathrm i}<\widehat{\mathrm h}(\mathrm p)\right)={\textstyle\sum_{\mathrm i=1}^{\mathrm z}}\;\left({\mathrm w}_{\mathrm i}\vert{\widehat{\mathrm S}}_{\mathrm i}<\widehat{\mathrm h}(\mathrm p)\right)$$
(24)

where \(\widehat{h}\left(p\right)\) is the pth quantile in the ith MCS, wi is the likelihood weight of the results at ith training trial in MCS. The upper and lower limits are calculated by:

$$\widehat{\mathrm h}\left(\mathrm p\right)=\begin{Bmatrix}\mathrm{PL}^{\mathrm u}=\overline{{\mathrm S}_{\mathrm j}}+{\mathrm t}_{\mathrm v,0.0025}{\mathrm\sigma}_{\mathrm j}\\\mathrm{PL}^{\mathrm l}=\overline{{\mathrm S}_{\mathrm j}}-{\mathrm t}_{\mathrm v,0.0025}{\mathrm\sigma}_{\mathrm j}\end{Bmatrix}$$
(25)

where \(\overline{{S }_{j}}\) is the mean of AI model results over the MCS and \(\sigma\) j is the variance, v is the degree of freedom, and tv is the threshold of empirical cumulative distribution function of AI model outputs from MCS runs. The prediction intervals are determined by:

$$\begin{Bmatrix}\mathrm{PI}^{\mathrm u}=\mathrm{PL}^{\mathrm u}+{\widehat{\mathrm S}}_{\mathrm i}^{\mathrm{opt}}\\\mathrm{PI}^{\mathrm l}=\mathrm{PL}^{\mathrm L}+{\widehat{\mathrm S}}_{\mathrm i}^{\mathrm{opt}}\end{Bmatrix}$$
(26)

where the PIu and PIl are the upper and lower limits of AI model results, and \({{\widehat{S}}_{i}}^{opt}\) is the prediction of optimal AI model. The confidence interval of the AI results for uncertainty quantification is determined using the 2.5th and 97.5th percentiles of 1000 MCS run:

$$95\mathrm{PPU}=100\frac{\mathrm{Count}(\mathrm S\vert{\mathrm S}_{\mathrm L}\leq\mathrm S\leq{\mathrm R}_{\mathrm U})}{\mathrm N}$$
(27)
$$\mathrm d-\mathrm{factor}=\frac1{\mathrm N\times\mathrm\sigma}{\textstyle\sum_1^{\mathrm N}}\left({\mathrm S}_{\mathrm L}-{\mathrm S}_{\mathrm U}\right)$$
(28)

The reliable model is the one resulted from 100% of data bracketed by 95PPU. The desirable value of 95PPU is greater than 80%. The desirable d-factor value is less than one. Furthermore, two metrics are used to quantify the reliability and resilience of AI models for the prediction of sour depth downstream of GCS. The first metric is the reliability analysis evaluating the overall consistency calculated by the value of random error from the simulation model. High reliability occurs when a model produces similar results under consistent conditions.

$$\mathrm{Reliability}=\frac{100}{\mathrm n}{\textstyle\sum_{\mathrm i=1}^{\mathrm n}}\;{\mathrm k}_{\mathrm i}$$
(29)

where the \({k}_{i}\) is calculated based on the relative average error (RAE) values of predictions and the threshold value of RAE, if RAEi ≤ α then ki = 1; otherwise, ki = 0. The RAE is calculated as

$${\mathrm{RAE}}_{\mathrm i}=\left|\frac{{\mathrm S}_{\mathrm{ro},\mathrm i}-{\mathrm S}_{\mathrm{rp},\mathrm i}}{{\mathrm S}_{\mathrm{ro},\mathrm i}}\right|$$
(30)

In which the \({S}_{ro,i}\) is the measured value, \({S}_{rp,i}\) is the estimated model output, and n is the number of data. The second index is the resilience evaluation index regarding predicted S values over observed values to withstand stressors, adapt, and rapidly recover from disruptions of the estimations. The higher value of resiliency confirmed the higher levels of the robustness of the predicted values to noise and is calculated as:

$$\mathrm{Resiliency}=\frac{\sum_{i=1}^{n-1}{r}_{i}}{n-\sum_{i=1}^{n}{k}_{i}}\times 100$$
(31)

In which, the ri is the total cases in which the simulation had the possibility of recovering from an inaccurate prediction to an accurate forecast. Also, four criteria are used in order to evaluate the expert systems and identify and select the most critical parameters for the scour depth prediction. The coefficient of determination (R2), root mean square error (RMSE), mean bias error (MBE) (Seyedian and Rouhani 2015), and Akaike information criterion (AIC) are used as presented in Eqs. (3235). The R2 values closer to one represent better performance and more accurate predictions. The MBE index shows the extent of skewness (bias) in predictions, and ideally, its value should be zero. Positive and negative MBE values show that the model tends to overestimate or underestimate, respectively. Both RMSE and MBE represent the predictive error of the models. The closer the RMSE and MBE values are to zero, the higher the accuracy of the model and the closer predictions are to observations. In the ideal case where all predicted and observed data are equal, the above indicators will be MBE = 0, RMSE = 0, and R2 = 1:

$$\mathrm R^2=1-\frac{\sum_{\mathrm i=1}^{\mathrm n}\left({\mathrm O}_{\mathrm i}-{\mathrm P}_{\mathrm i}\right)^2}{\sum_{\mathrm i=1}^{\mathrm n}\left({\mathrm O}_{\mathrm i}-{\overline{\mathrm O}}_{\mathrm i}\right)^2}$$
(32)
$$\mathrm{RMSE}=\sqrt[2]{\frac{\sum_{\mathrm i=1}^{\mathrm n}{{(\mathrm P}_{\mathrm i}-{\mathrm O}_{\mathrm i})}^2}{\mathrm n}}$$
(33)
$$\mathrm{MBE}=\frac1{\mathrm n}{\textstyle\sum_{\mathrm i=1}^{\mathrm n}}\left({\mathrm P}_{\mathrm i}-{\mathrm O}_{\mathrm i}\right)$$
(34)
$$\mathrm{AIC}=\mathrm n\times\ln\left(\mathrm{RMSE}^2\right)+2\mathrm K$$
(35)

In these equations, P denotes predicted values, O denotes observed values, \(\overline{O }\) is the mean of observed values, n is the number of data, and K represents the total number of model parameters including all constant and variable parameters.

Results and discussion

Scenario-based evaluation of expert systems

MBE, RMSE, and R2 values of the expert model results of the testing and training phases of seven scenarios are presented in Table 4. The scatter plots of consistency between predicted and observed scour depths of different scenarios of the testing phase are displayed in Fig. 3. As the first scenario, some points pertaining to the GEP model are below the 1:1 line, but the points of this model are generally less dispersed than the predictions of ELM, GMDH, and LSSVM models (Fig. 3a). The results in Table 4 show that the GEP model has the highest coefficient of determination (R2 = 0.89) and the lowest error (RMSE = 0.95) among the other expert models in the testing phase. There is no difference between the coefficient of determination and error of the ELM and GEP models (RMSE = 1.02, R2 = 0.86). For the GMDH model, most of the points seem to be close to the 1:1 line, but according to the coefficient of determination, the points have a slightly higher scattering than those of GEP and ELM models.

Table 4 Coefficient of determination and error of the evaluated LSSVM, ELM, GEP, GMDH scour depth prediction models for seven applied scenarios
Fig. 3
figure 3

Comparison of predicted and observed values of scour depth at the downstream of grade control structures for testing phase: a scenario 1, b scenario 2, c scenario 3, d scenario 4, e scenario 5, f scenario 6, and g scenario 7. (x-axis is observed s/Z and y-axis is predicted s/Z)

As the second scenario is shown in Fig. 3b, a small number of points of the GEP model lay outside the ± 25% range, and the model has mainly underestimated the scour depth (MBE =  − 0.14). In this scenario, the ELM model has a lower bias than other models (MBE =  − 0.01) and has almost the same coefficient of determination and error as the GEP model. According to Fig. 3b and Table 4, the GMDH model underestimates the scour depth (MBE =  − 0.16) with about 15% higher error than the ELM model.

As the third scenario, GEP and the ELM models perform almost the same in terms of error and scattering. Compared to GEP and ELM, GMDH model predictions show a higher scattering, and its coefficient of determination is 20% lower than the others while its error is 30% higher.

As the fourth scenario, most of the GMDH model predictions are located below the 1:1 line, which reflects the fact that the model significantly underestimates the scour depth. As the fifth scenario, the GMDH model fairly underestimates and the LSSVM model significantly overestimates the scour depth.

As the sixth scenario, the GEP and ELM models perform with a similar rate of error (RMSE = 1.03), and in comparison, the GMDH and LSSVM models show a higher error and lower coefficient of determination. The scour depth predictions by the GEP and ELM models are almost the same and lay within the ± 25% range. The errors of the two models are mostly in the form of underestimation.

As the seventh scenario using the energy parameter as a new effective parameter, the GEP model shows the lowest error (RMSE = 0.88, MBE =  − 0.12) and highest coefficient of determination (R2 = 0.92), and most of its points lay within the ± 25% range. The ELM model shows 10% higher error than the GEP model. Moreover, all the models generally underestimate the scour depth.

In all scenarios, as the results are shown in Table 4 and Fig. 3, the LSSVM model shows the lowest prediction accuracy, and most of the scour depth predictions of the GEP model perform reasonably, as compared to the other three models.

The GEP model estimates better results based on scenarios 1 (GEP-1) and 7 (GEP-7) among the evaluated models. The GEP equations of these best scenarios as the final predictive equations of scour depth are derived as follows:

$$\mathrm{GEP}-1:\frac{\mathrm s}{\mathrm z}=\left[\left(\left(\frac{\mathrm h}{\mathrm H}\right)^{1.27}\times1.42\left(\frac{\mathrm b}{\mathrm B}\right)\right)-\frac{\mathrm h}{\mathrm H}\right]+\left[\left(\sqrt{\frac{{\mathrm d}_{90}}{{\mathrm d}_{50}}}\right)^{\left(1.11\left(\frac{\mathrm b}{\mathrm B}\times\frac{\mathrm h}{\mathrm H}\right)\right)\times{\mathrm A}_{50}}\right]+\left[\frac{{\mathrm d}_{90}}{{\mathrm d}_{50}}\left(-0.00234\left(\left(\frac{{\mathrm d}_{90}}{{\mathrm d}_{50}}\times\frac{\mathrm b}{\mathrm z}\right)-9.82\right)\right)\right]$$
(36)
$$\mathrm{GEP}-7:\frac{\mathrm s}{\mathrm z}=\left[\frac{{\mathrm A}_{50}}{\left(\frac{\mathrm b}{\mathrm B}-\frac{{\mathrm d}_{90}}{{\mathrm d}_{50}}\right)-\frac{\mathrm b}{\mathrm B}}\right]+\left[{\mathrm A}_{50}\sqrt{\left(\sqrt{\frac{d_{90}}{d_{50}}}\right)^{\left(\frac{\mathit b}{\mathit B}-\frac{\mathit h}{\mathit H}\right)\times\frac{\mathit h}{\mathit H}\times\frac{d_{90}}{d_{50}}}}\right]+\left[\frac{\mathrm b}{\mathrm B}\left(\left(\frac{{\mathrm d}_{90}}{{\mathrm d}_{50}}+\frac{\mathrm z}{\mathrm E}+2\frac{\mathrm h}{\mathrm H}\right)\left(\sqrt{\frac{\mathrm z}{\mathrm E}}-\frac{\mathrm z}{\mathrm E}\right)\right)\right]$$
(37)

Input parameter significance

The importance of input parameters in each model was assessed by removing them one at a time from the first scenario. The results of the sensitivity analysis of the evaluated expert system models are presented in Table 4. Figure 4 displays the results of the model’s performance based on the scenario-based evaluations.

Fig. 4
figure 4

Comparison of the performance of the tested models based on different scenarios in terms of evaluation criteria

Based on the results in Fig. 4, in the first scenario, where all five parameters were included, the GEP model performance results a lower error and higher coefficient of determination than the other scenarios. In the second scenario, with the removal of b/B from the first scenario, the error somewhat increases, and the coefficient of determination decreases. In the third scenario, the error of the GEP model is higher and its coefficient of determination is lower than the first scenario. In the fourth scenario, with the removal of d90/d50, the error increases, but the coefficient of determination remains similar to the first scenario. In the fifth scenario, the error is significantly higher than the first scenario \(\left(\frac{RMSE 5}{RMSE 1}=\frac{1.29}{0.95}\right)\), and the coefficient of determination is significantly lower (\(\frac{{R}^{2} 5}{{R}^{2} 1}=\frac{0.75}{0.89}\)). The results of this scenario show that removal of A50 in the fifth scenario has a great impact on the accuracy of scour depth predictions of GCS and proves its significance. In the sixth scenario, with the removal of b/z, the error increases, but the coefficient of determination also increases by about 0.1 (all compared to the first scenario). In the seventh scenario, the addition of the parameter Z/E decreases the error from 0.95 (in the first scenario) to 0.88 and decreases the MBE from − 0.16 to − 0.12. In conclusion, these results show that using the energy parameter in the GEP model can improve both RMSE and MBE.

As the GMDH, LSSVM, and ELM models, the highest accuracy of scour depth prediction results from the first scenario, and the highest error and lowest coefficient of determination are seen for the fifth scenario, where the parameter A50 is removed. The addition of the energy parameter for the seventh scenario decreases the RMSE of the GMDH and ELM models from respectively 1.06 and 1.02 (the first scenario) to 0.99 and 0.97 and increases their coefficient of determination from 0.84 and 0.86 (the first scenario) to 0.86 and 0.88, respectively. The LSSVM model shows almost the same error and coefficient of determination for the first and fifth scenarios (R2 = 0.71).

In general, all of the developed expert system models exhibit the highest error and the lowest correlation for scenario 5, where A50 is removed from calculations. Therefore, it can be concluded that A50 is the most significant parameter for predicting the scour depth. This conclusion is consistent with the results of Sattar et al. (2017) and Guven (2011). A sensitivity analysis performed by Guven (2011) showed that removing the parameter d90/d50 significantly decreases the accuracy of the scour prediction. This is inconsistent with the finding of the present study that the A50 is the most significant parameter. A study by Tavakolizadeh and Kashefipour (2008) reported that all of the b/B, A50, h/H, b/z, and d90/d50 parameters were significant for scour depth prediction, which partially supports the results of the present study. Najafzadeh (2015) also found the b/B to be the most effective parameter in the prediction of scour depth.

In all scenarios, the lowest error and the highest coefficient of determination values are provided by the GEP as the most accurate model followed by the ELM and GMDH models. The highest error and the lowest coefficient of determination result from the LSSVM model predictions. Based on the results of the present study, the GEP model produces more accurate predictions of scour depth than other models. This is consistent with the findings of Mesbahi et al. (2016) which shows that the GEP model is the best approach for predicting scour downstream of structures, as well as the results of Moussa (2013) which reported reasonable performance of the GEP model predicting the scour depth downstream of hydraulic structures. Furthermore, based on the findings of this study, the ELM model performs better than the LSSVM model which is consistent with the results of Nourani et al. (2017). The results of Huang et al. (2012) also showed the superiority of the ELM in comparison to LSSVM in regression and classification applications. Ebtehaj et al. (2018) also reported that ELM provides significantly better results than the regression-based models predicting the scour depth around bridge piers.

Uncertainty, reliability, and resiliency analysis results

A three-aspect comparison is established for each model while being compared with the observed dataset (indicated by the actual label in the horizontal axis) and is depicted in Fig. 5 as a Taylor diagram. Taylor diagram summarizes evaluation indices of several model results. In the Taylor diagram, the performance of the models is visually displayed on the polar diagram by comparing the predicted values with the actual ones (Riahi-Madvar et al. 2019b; zhu et al. 2019). The reference point denotes the observation values located on the horizontal axis (standard deviation). Also, the azimuth angle of the correlation coefficient diagram indicates the actual and predicted values. Besides, the radial distance from the reference point describes the normalized standard deviation of the predicted values from the actual ones. Each point in this diagram shows the accuracy of each model, and the closer the model is to the reference point, the more accurate it is. The results of this study show the minimum RMSD value of 0.88 in the testing phase, which is related to the GEP model. The GEP also displays the highest correlation coefficient compared to the other three models. As shown by Fig. 5, GEP is the best and closest model to the reference point.

Fig. 5
figure 5

Taylor diagram, performance of different models versus actual values in the testing phase

Monte Carlo simulation is employed to determine the uncertainty of the modeling process. In this method, the input parameters are described using a probability distribution, and based on this distribution, a single set of input data is randomly generated. Multiple simulations (typically 1000) must be performed so that the results would not influence the probability distribution of the output variable. Here, the uncertainty and robustness of the proposed model (GEP-7) are investigated. Moreover, the quality of fitted models is evaluated using three indices of 95PPU, reliability, and resiliency. The evaluation results are reported in Table 5.

Table 5 Comparisons of the uncertainty, reliability, and resiliency analysis of the model results

According to Table 5, the uncertainty of the models at the training phase does not significantly differ from each other. However, in the testing phase, the GEP model with 95PPU = 62 is slightly better than LSSVM (95PPU = 64), ELM (95PPU = 65), and GMDH (95PPU = 65). The results indicate that the scour depths predicted by GEP are more reliable (56.96%) and resilient (57.58%) than LSSVM (46.29 and 45.53), ELM (26.96 and 25.60), and GMDH (28.70 and 27.44) models. Furthermore, the results associated with the testing phase are similar to the training, and GEP with reliability of 52.63 and resiliency of 67.57 has a better rating than the other three. In order to perform a more comprehensive assessment of the GEP-7 model, two criteria of confidence bounds (95PPU) and d-factor are used in the training and testing phases. The 95PPU criterion represents the percentage of data fitting in 95% confidence bounds. In this section, the uncertainty analysis of the GEP-7 model is quantified in the training and testing phases using confidence intervals. 95% confidence intervals (95PPU) are determined by calculating the 2.5th and 97.5th percentiles of the cumulative probability distribution function. The d-factor indicates the confidence bounds width, and theoretically the best value for this criterion is zero, and the higher the d-factor is, the greater the uncertainty produced. As the d-factor value is higher, a large amount of data will fall into confidence bounds, and this shows the two above-mentioned criteria complement each other. However, the proposed model would not be useful due to the high uncertainty, and the best result could be achieved when 100% of the predictions were within 95PPU. As mentioned before, to calculate the uncertainty of the GEP model, 1000 simulation runs are performed with Monte Carlo method. The model uncertainty would be lower with narrower confidence bounds and larger percentage of observation data within the 95PPU range. The best model is the one with minimum difference between the lower and upper bands and the highest percentage of observation data within 95PPU.

The values of d-factor and 95% confidence intervals associated with scour depth predictions of the best model (GEP-7) in the training and testing phases are presented in Fig. 6, and the values of 95PPU are 69% and 62%, respectively. The proposed model is acceptable if more than 80% of the observation data would fall within the 95PPU bound. However, a value of 50% can be acceptable as the measurement data endured an imperfect quality (Abbaspour et al. 2007). The values of d-factor are obtained as 0.25 and 0.4 for the training and testing phases, respectively. Xue et al (2014) and Abbaspour et al. (2007) reported that the d-factor value of less than 1 would be desirable. Lastly, due to the narrowness of confidence bounds of the training and testing phases (d-factor = 0.25, 0.40) as well as the 95PPU values, the GEP-7 model achieves an acceptable uncertainty in both phases.

Fig. 6
figure 6

95PPU band for the estimates of scour depth (s/z) using GEP-7 in comparison with observed value: a training phase and b testing phase

Comparison of GEP with existing equations

In order to compare the results of best scenarios (1, 7) with the existing equations shown in Table 3, the dataset of Lenzi et al. (2000) is applied as validation data where this data is not used in model training and testing phases. The existing equations are classified into regression-based equations (Eqs. (14–18, 21)) and AI-based equations (Eqs. (19 and 20)). The results of these comparisons are presented in Table 6. The RMSE values of Yen (1987), Guven (2011), and D’Agostino and Ferro (2004) were 3.77, 9.81, and 2.32, respectively, and their MBE values were 3.33, 2.85, and 1.99 respectively. Comparison of these values with the developed GEP model results indicates very low accuracy of these equations and significant overestimation of the scour depth.

Table 6 Evaluation of the developed GEP and existing models to predict the scour depth for validation data

The AIC index is used for a fair comparison of different equations with different complexities. AIC evaluates different equations using the model parameters. The AIC value increases when increasing the number of parameters. Therefore, AIC guarantees a fair comparison between simple equations with few parameters and complex equations with a large number of parameters. In general, a low AIC value indicates the higher efficiency of a model. For instance, RMSE and MBE of Laucelli and Giustolisi (2011) were slightly smaller than those of the Laucelli and Giustolisi (2011). Based on these criteria, Eq. (26) would achieve the better equation title. However, AIC indicates a better estimation of the scour depth by the Laucelli and Giustolisi (2011) because of its lower level of complexity and smaller number of parameters.

GEP-7 and GEP-1 models with an AIC value of − 31.65 and − 13.10, respectively, confirm the best performance predicting the scour depth, followed by those proposed by Sattar et al. (2017) and Guven (2011). Guven (2011) showed the highest AIC value. The equations proposed by Yen (1987) and D’Agostino and Ferro (2004) show the lowest performance predicting the scour depth.

Figure 7 displays the s/z values estimated by various equations in validation dataset. As evidently seen, the Guven (2011) overestimates some of the scour depth significantly. The equation proposed by Yen (1987) overestimates mostly and predicts the scour depth 3–5 times higher than the observed scour depth. Given the large errors of the equations proposed by Yen (1987), Guven (2011), and D’Agostino and Ferro (2004), their results are not displayed in the scatter plot (Fig. 7). The scour depths estimated by Guven (2011) are more scattered than other equations, and the range of predicted s/z is greater than 1. For instance, the observed values of 1.75 and 1.78 for s/z are predicted as 1.97 and 2.61, respectively. The corresponding values for 1.22 and 1.29 are 1.18 and 0.78, respectively.

Fig. 7
figure 7

Comparison of developed GEP and existing models

Based on the results in Fig. 7, all values predicted by Ben Meftah and Mossa (2020) are equal or below the − 25% line, indicating a significant underestimation. The error of this equation increases in higher s/z values. Unlike Ben Meftah and Mossa (2020), Laucelli and Giustolisi (2011) overestimate the scour depth two times higher than the observed depth in most points with the MBE value of 0.93. Nearly all points predicted by Laucelli and Giustolisi (2011) are equal or beyond the ± 25% line. This equation overestimates the lower s/z values and underestimates the higher s/z values.

The equation proposed by Sattar et al. (2017) underestimates the scour depth as s/z increases, but all points are in the range of ± 25% at low s/z values as presented in Fig. 7. All scour depth predictions in GEP-1 and nearly all points in GEP-7 are in the range of ± 25%. GEP-7 confirms that GEP-based models have succeeded to predict accurately the scour depth at lower and higher s/z values. Hence, the proposed model shows better results in prediction of the scour depth downstream of GCSs compared to the previously empirical and AI models.

Conclusion

Given the importance of the scouring downstream of hydraulic structures, which could destabilize, damage, or even destroy the structure, this study investigates the most important parameters influencing the scour depth downstream of grade-control structures using a scenario-based approach applying expert system models. To assess the modeling results, some error indices (e.g., R2, RMSE, MBE, AIC) and graphical evaluations (scatter plots, boxplots, column plots) are used in different phases (training, testing, and validation). The results indicate the superiority of energy and GEP-based approach (R2 = 0.86, RMSE = 0.16, MBE = 0.05, AIC =  − 31.65) compared to the ELM, LSSVM, and GMDH models. The scenario-based results of developed expert system models reveal that eliminating the b/B, h/H, d90/d50, A50, and b/Z parameters from GEP-1 model would increase the RMSE error 3, 6, 3, 30, and 4%, respectively. By adding the energy parameter to GEP-1, the error decreases 8%. Comparison of results shows that A50 is the most significant parameter predicting the scour depth. When A50 parameter is eliminated, the error in GEP, ELM, GMDH, and LSSVM models increases to reach 30, 34, 42, and 63%, respectively. The results of this study confirm that the energy-based approach increases the accuracy of scour depth predictions. In this study, GEP-based equations are reasonably more accurate and persistent than the previous equations and in good agreement with observed field data over validation phase. Since using empirical equations for predicting the maximum scour depth is only applicable in a specific range of data and laboratory conditions, soft computing methods can be recommended for the scour depth prediction. Due to the importance of uncertainty in the proposed relationship, the Monte Carlo simulation method was employed to assess this parameter. Three methods of 95PPU, reliability, and resiliency were employed to analyze the results. The results indicate that the proposed relationship shows the least uncertainty, and an acceptable percentage of the data would fall within 95% confidence intervals in testing phase. The proposed model exhibits several advantages over the conventional relations and artificial intelligence models, which include providing an explicit relationship to predict the scour depth and using flow energy parameter in order to improve the prediction accuracy. Further to these, GEP-based model can predict the scour depth with higher accuracy, reliability, and resiliency compared to other models.

The developed uncertainty analysis framework presented in this study is a new approach in reliable scour depth prediction, and the extracted equation can be combined with mathematical models of sediment transport and scour hole geometry predictions or real-case modeling of scour around hydraulic structures. Finally, the authors of this study would like to acknowledge that different data measurement and collection methods used in the previous studies would constitute a major limitation of this study and a potential source of error while compiling the data set for the machine learning.