1 Introduction

Scour is a natural phenomenon that occurs in alluvial streams as a result of the erosive action of flowing water (Oliveto and Hager 2005; Eghlidi et al. 2020). Abutments are constructed near the stream banks for constructing and supporting a bridge. It is generally recognized that abutments are undermined by river-bed erosion and scouring, which are the leading causes of abutment failure (Pandey et al. 2020; Afzal et al. 2021). Natural processes or man-made interactions can cause local scour in streams (Barbhuiya and Dey 2004; Kothyari et al. 2007; Goyal and Ojha 2011). Abutments and other hydraulic structures are often failed by scour (Pandey et al. 2020). Earlier studies on scour around abutments tended to focus on the prediction of maximum, and time-dependent scour depth (Barbhuiya and Dey 2004; Kothyari et al. 2007). The Federal Highway Administration 1973 report illustrates that more than 400 bridges failed due to pier and abutment scour (Pandey et al. 2018). These collapsed bridges show the importance of realistic prediction of scour around the bridge elements (Kumar et al. 2022). Scour around the abutments in natural streams is still a matter of alarm, while numerous researchers and engineers have proposed numerous mathematical and numerical models using laboratory and field studies (Barbhuiya and Dey 2004; Dey and Barbhuiya 2005). Therefore, improving the abutment scour phenomenon and processes is vital to calculate the maximum abutment scour depth at equilibrium scour conditions. (Oliveto and Hager 2005) carried out an experimental study on abutment scour and stated that the minimum required laboratory dimensions to apply Froude number similarity. They also checked the influence of sloping abutments on scour. As long as the limitations of the computational approach are respected, the outcomes of their study could be applied in practice. (Dey and Barbhuiya 2005) completed an experimental flume study on abutment scour and derived a semi-empirical method to calculate the abutment scour around a short vertical wall, semicircular, and 45° wing wall abutments (length/flow depth \(\le\) 1). By considering the horse-shoe vortex system as the prime agent of abutment scour, they followed the conservation of sediment mass for analyzing the experimental data. The maximum scour depth at equilibrium conditions around abutments is key to the river and bridge engineers (Barbhuiya and Dey 2004).

The scour processes and flow patterns around the abutments are so complicated; thus, it is hard to derive a general empirical relationship (Azamathulla et al. 2010, 2013; Singh et al. 2020). The scour depth around the abutment can be calculated using numerous empirical relationships (Dey and Barbhuiya 2005; Mohammadpour et al. 2017). Each relationship yields good agreements with experimental values just for the specific datasets. Previous studies show the importance of the prediction of abutment scour precisely. Further, insufficient field data would lead to uncertainty of abutment scour equations (Mohammadpour et al. 2016).

With this in view, precise prediction of the abutment is cumbersome and needs to implement robust approaches like gene-expression programming (GEP), artificial neural network (ANN), generalized reduced gradient (GRG), genetic algorithm (GA), evolutionary polynomial regression, modern multi-level ensemble approach and adaptive neuro-fuzzy inference system (ANFIS) to predict abutment scour (Azamathulla et al. 2010; Mohammadpour et al. 2016; Mohammadpour 2017; Aamir and Ahmad 2019; Singh et al. 2022). Azamathulla et al. (Najafzadeh and Azamathulla 2013) applied soft computing approaches to calculate the pipeline scour and illustrated the best results with experimental datasets. (Mohammadpour et al. 2016) applied ANN and ANFIS to calculate the abutment scour. They stated that the ANN and ANFIS could be successfully used to predict the scour depth at different bridge elements.

In this study, the authors aim to examine the existing maximum scour depth data at equilibrium scour conditions for rectangular wall abutments. Existing abutment scour depth datasets (263 datasets) are collected from (Coleman et al. 2003) and (Dey and Barbhuiya 2005). Further, 45 flume experiments are carried out at the National Institute of Technology in Warangal, India. Previously proposed empirical relationships are also examined to analyze the performance based on available datasets (a total of 308 datasets). Although the applications of robust approaches in maximum scour depth prediction have been of interest to several investigators because of their accuracy and simplicity, no research work has been undertaken to develop a machine learning approach to predict the maximum scour depth at the bridge abutment. However, previous studies on other hydraulic structures scour have shown good agreements with observed values. Considering the significance of the concern of bridge abutment scour, which is mainly responsible for the scour hazards. An effort has been made to develop machine learning based models with a wide range of experimental data. The main objectives of this study are: 1- Application of the CatBoost model as new a machine learning approach to model abutment scour depth using a vast experimental database, 2- Compare developed model with common machine learning approaches such as K nearest neighbor and extra tree models. 3- Compare existing empirical equations with the machine learning models.

2 Methodology

2.1 Maximum Scour Depth Relationships

Numerous studies are available on abutment scour. Most studies have examined the maximum scour at equilibrium scour conditions. At equilibrium conditions, maximum scour depth is influenced by flow properties, sediment characteristics, and abutment geometry (Barbhuiya and Dey 2004; Bressan et al. 2011). The variables that influence the maximum abutment scour depth (dse) at equilibrium conditions in uniform sediment beds are as follows:

$$d_{se} = f\left( {V,y,\rho ,\upsilon ,d_{50} ,V_{cr} ,\rho_{s} ,L,g} \right)$$
(1)

where dse is the maximum abutment scour depth, V is time-average flow velocity, y is the approach flow depth, \(\rho\) is the mass density of the fluid, \(\upsilon\) is the kinematic viscosity of fluid, d50 is the median diameter of sediment, Vc is the threshold velocity of sediment, \(\rho_{s}\) is the mass density of sediment, L is the transverse length of the abutment and g is the gravitational acceleration.

For sediment-fluid interaction, Eq. (1) should not contain independent parameters \(\rho ,\rho_{s} {\text{ and }}g\) (Dey and Barbhuiya 2005). (Dey and Barbhuiya 2005) gave a better representation of above mentioned parameters, and hence, Eq. (1) becomes

$$d_{se} = f\left( {F_{{d_{50} }} ,\frac{V}{{V_{cr} }},\frac{y}{L},\frac{L}{{d_{50} }}} \right)$$
(2)

where, densimetric Froude number \(F_{{d_{50} }} = \frac{V}{{\sqrt {(S - 1)gd_{50} } }}\), S is the relative density of sediment.

Many researchers derived mathematical relationships that calculate the maximum abutment scour depth using various parameters (Kandasamy and Melville 1998; Melville and Coleman 2000). (Melville and Coleman 2000) proposed an abutment scour relationship on the basis of different empirical factors or K factors which show the influence of flow, sediment, and abutment characteristics. These K-factors can be calculated by the curve fitting method, and the maximum abutment scour depth (dse) expressed in terms of the product of K-factors is given as

$$d_{se} = K_{{d_{50} }} K_{I} K_{S} K_{y} K_{\alpha }$$
(3)

where dse is the maximum abutment scour depth at equilibrium condition, Kd50 is the sediment gradation factor, KI is the flow intensity factor, Ks is the abutment shape factor, Kt is the time factor, Ky is the flow depth–abutment size factor, and \(K_{\alpha }\) is the abutment alignment factor. For vertical wall abutments, Ks and \(K_{\alpha }\) = 1.

K factors can be calculated using different empirical equations, given as

$$\left. \begin{gathered} K_{{d_{50} }} \left( {L/d_{50} \le 25} \right) = 0.57\log \left( {2.24\frac{L}{{d_{50} }}} \right) \hfill \\ K_{{d_{50} }} \left( {L/d_{50} > 25} \right) = 1 \hfill \\ \end{gathered} \right\}$$
(4)
$$\left. \begin{gathered} K_{I} = \frac{{V - \left( {V_{a} - V_{cr} } \right)}}{{V_{c} }}, \, for \, \frac{{V - \left( {V_{a} - V_{cr} } \right)}}{{V_{c} r}} < 1 \hfill \\ K_{I} = 1, \, for \, \frac{{V - \left( {V_{a} - V_{cr} } \right)}}{{V_{cr} }} \ge 1 \hfill \\ \end{gathered} \right\}$$
(5)
$$\left. \begin{gathered} K_{y} \left( {L/y \le 1} \right) = 2L \hfill \\ K_{y} \left( {1 < L/y < 25} \right) = 2\left( {yL} \right)^{0.5} \hfill \\ K_{y} \left( {L/y \ge 25} \right) = 10y \hfill \\ \end{gathered} \right\}$$
(6)

where Va is armor peak velocity.

Other abutment scour depth relationships for vertical-wall abutments under different regimes are given in Table 1.

Table 1 Summary of maximum scour depth relationships for vertical-wall abutments

2.2 Description of Collected Data from Literature and Present Experimental Work

A wide range of existing abutment scour depth datasets (263 datasets) have been collected from (Coleman et al. 2003) and (Dey and Barbhuiya 2005). In addition, 45 flume experiments have been carried out at the NIT Warangal, India. Additional tests were completed in a fixed bed masonry rectangular flume. Flume dimensions were 16.0 m long, 1.0 m wide, and 0.40 m deep. The flume's test section started 8.0 m from its entrance and had dimensions of 3.0 m long, 0.80 m wide, and 0.25 m deep. Uniform sand with a median diameter of 0.27 mm and a geometric standard deviation of 1.17 was used as the sediment bed. We used vertical wall abutment models with transverse lengths viz. 5.0, 6.8, 7.5, 8.4, 9.8, 10.6, 12.5, 15.0, 17.5 and 19.0 cm. All tests were conducted under non-submerged conditions. A valve was fixed into the flume inlet pipe to control the flowrate. At the downstream end of the flume, a tail gate and a pre-calibrated rectangular notch were fixed to maintain the flow depth and measure the flowrate, respectively. The experiments were performed until they reached an equilibrium scour, i.e., no change in scour geometry over time. All tests in this study were carried out for 20 h. Experimentally, it was observed that the maximum scour depth (dse) at equilibrium condition is located at the upstream nose of the abutment. The Vernier point gauge was used to determine the maximum abutment scour depth under equilibrium conditions. Due to the fact that all experiments were conducted in clear-water scour, the threshold velocity ratio was always less than one.

The vertical-wall abutment model was fixed in the test section at the right side of the flume prior to the start of the experiment (as can be seen in Fig. 1), i.e., located 9.5 m from the flume entrance. The sand bed was leveled perfectly with the bed slope and then covered with a 3 mm acrylic sheet to avoid unwanted scouring around the abutment. We achieved desired flow conditions using the inlet valve and flume tail gate. The acrylic sheet was sensibly removed after getting the desired flow conditions. Table 2 summarizes several parameters that influence the maximum abutment scour depth.

Fig. 1
figure 1

a Flume layout, b photometric view of running the experiment

Table 2 Summary of influencing parameters for all datasets

Table 3 summarizes the statistical analysis of all data utilized in this investigation. According to the values of kurtosis and skewness presented in Table 3, the densimetric Froude number (\({F}_{d50}\)) has an approximately normal distribution. However, other dimensionless input variables do not follow the normal distribution. The dimensionless output parameter (\({d}_{se}/L\)) also approximately follows the normal distribution.

Table 3 Descriptive statistics for all variables

Correlation and regression analyses should be utilized to determine which variables influence the target variable. Using Pearson correlation coefficients, Fig. 2 depicts the relationship between the dimensionless abutment scour depth (\({d}_{se}/L\)) and the input components. Correlation coefficients with a positive value suggest a direct association between two variables, while those with a negative value imply an inverse relationship. According to Fig. 2, the ratio of depth to transverse length of the abutment (\(y/L\)) with a correlation coefficient of + 0.75 is the most effective input variable in predicting scour. The densimetric Froude number (\({F}_{d50}\)) with a correlation coefficient of + 0.51 is the second effective variable. The lowest correlation coefficient + 0.4 belongs to the mean velocity to critical velocity ratio (\(V/{V}_{c}\)). The ratio of transverse length of the abutment to the diameter of sediment particles with a correlation coefficient of -0.45 has the opposite effect on scour depth, and with increasing it, the amount of scour decreases.

Fig. 2
figure 2

Correlogram of input and target variables

2.3 Gradient Boosting Decision Tree (GBDT) for Feature Selection

Many machine learning algorithms have been proposed and utilized to solve classification and regression problems over the year. But, the gradient boosting decision tree (GBDT) is one of the most popular algorithms for handling the classification and regression issues based on weak decision trees integration (Friedman 2002). In other words, the GBDT model is an ensemble of decision trees that assimilate a series of weak base learners with many leaf nodes and avoid the overfitting problem (Tao et al. 2022). The amount of error in each node is measured using the weak learners, and a test function is utilized for splitting the node (Fan et al. 2018). The comprehensive background of GBDT can be obtained from Friedman (2002). In this study, the GBDT algorithm is exploited for feature selection and to predict the abutment scour depth (ASD).

2.4 Machine Learning Approaches

2.4.1 Categorical Boosting (CatBoost)

CatBoost is a new machine learning algorithm that was exposed by Prokhorenkova et al. (2018) for dealing the categorical features. It is a subset of the gradient boosting decision tree (GBDT) family but is different in working style. CatBoost is more powerful than other machine learning algorithms, i.e., XGBoost (extreme gradient boosting) and LightGBM (gradient boosting machine)(Ke et al. 2017), in handling complex and noisy data. Recently, the CatBoost algorithm has been widely used in hydrological modeling like reference evapotranspiration estimation (Bian et al. 2020), pan-evaporation estimation (Dong et al. 2021), and prediction of flash flood susceptibility (Saber et al. 2021). The enhancement of CatBoost comprises the following aspects:

  1. 1.

    Manage categorical features during the training period instead of pre-processing period. In training, the complete dataset is permitted by CatBoost. The Greedy target-based statistics (Greedy TBS) method is used for handling categorical features with the least information loss. Precisely, for each sample, a random permutation of the dataset was performed by CatBoost to calculate an average label value for the sample with the same category value positioned before the given one in the permutation. Assume a dataset of observations \(D=\left\{{X}_{i}, {Y}_{i}\right\} i=1,\dots ,n\) and if a permutation is \(\theta ={\left({\sigma }_{1},{\sigma }_{2},\dots ,{\sigma }_{n}\right)}_{n}^{T}\), it is changed with (Prokhorenkova et al. 2018):

    $${x}_{{\sigma }_{p,k}}=\frac{\sum_{j=1}^{p-1} \left[{x}_{\sigma j,k}={x}_{{\sigma }_{p,k}}\right]\times {Y}_{{\sigma }_{j}}+\beta \times P}{\sum_{j=1}^{p-1} \left[{x}_{{\sigma }_{j,k}}={x}_{{\sigma }_{p,k}}\right]+\beta }$$
    (7)

Here, \(\beta\) = weight of prior, \(P\) = prior value. In the dataset, the prior is the average label value, which helps in reducing the low-frequency category noise.

  1. 2.

    Feature combinations. CatBoost integrates all categorical features and their combinations in the current tree with all categorical features in the dataset using a greedy method.

  2. 3.

    Unbiased boosting with categorical features. CatBoost used an ordered boosting method to overcome the gradient bias (Prokhorenkova et al. 2018) and improve the generalization ability of the model.

  3. 4.

    Fast scorer. CatBoost uses oblivious trees as base predictions since they use the same splitting criterion across the tree's levels. These trees are less prone to overfitting and have a more balanced growth pattern. Each leaf index in oblivious trees is represented as a binary vector with a length equal to the tree's depth. This rule is employed to compute model predictions in CatBoost model evaluators because all binaries comprise float, statistics, and one-hot encoded features. Figure 3 shows the architecture of the CatBoost algorithm.

Fig. 3
figure 3

The flow diagram of the CatBoost model

2.4.2 K-Nearest Neighbor (KNN)

K-nearest neighbor (KNN) is a non-parametric regression technique that was first proposed by Fix and Hodges (1951) for optimizing classification and prediction problems (Karlsson and Yakowitz 1987). The background history of the KNN exposes its effective application in hydrology (Sikorska-Senoner and Quilty 2021). In KNN, the independent variables (or predictors) are the input for the prediction objective.

2.4.3 Extra Tree Regression (ETR)

Geurts et al. (2006) proposed the idea of extra tree regression (ETR), which is a new ensemble machine learning model to perform regression or classification tasks based on many united decision trees (DT). A classical top-down procedure is used to construct the ETR model (Geurts et al. 2006). Many applications of the ETR model have been found in different fields (Heddam et al. 2020; Seyyedattar et al. 2020; Asadollah et al. 2021). The ETR is a highly randomized version of random forest (RF) with two main differences. The \(K\) (the number of features randomly nominated at each node), and \({n}_{min}\) (the minimum sample size for splitting a node) are the two main parameters of the ETR, which avoids the overfitting and enhance the prediction accuracy of the model (Asadollah et al. 2021). In this research, ETR model was developed in scikit-library of Python programming for ASD prediction by tuning its parameters.

2.5 Performance Indicators

he efficacy of the applied machine learning paradigms i.e., KNN, Extra Tree, and CatBoost, for predicting the abutment scour depth (ASD) was evaluated by employing the five different performance indicators, including MAPE (mean absolute percent error), RMSE (root mean square error), R (coefficient of correlation), IA (Willmott agreement index) (Willmott 1982), and U95% (uncertainty coefficient with 95% confidence level) (Patino and Ferreira 2015). The mathematical formulas of R, RMSE, MAPE, IA, and U95% indicators are listed as follows:

$$R=\frac{\sum_{\mathrm{i}=1}^{\mathrm{N}}\left({ASD}_{meas,i}- \overline{{ASD }_{meas}}\right) \left({ASD}_{pred,i} - \overline{{ASD }_{pred}}\right) }{\sqrt{\sum_{\mathrm{i}=1}^{\mathrm{N}}({{ASD}_{meas,i}- \overline{{ASD }_{meas}})}^{2} \sum_{\mathrm{i}=1}^{\mathrm{N}}({{ASD}_{pred,i} - \overline{{ASD }_{pred}})}^{2} }}$$
(8)
$$RMSE=\sqrt{\frac{1}{N} \sum_{i=1}^{N}({ASD}_{meas,i}- {ASD}_{pred,i}{)}^{2}}$$
(9)
$$MAPE=\frac{1}{N} \sum_{i=1}^{N}\left|\frac{{ASD}_{meas,i}- {ASD}_{pred,i}}{{ASD}_{meas,i}}\right|$$
(10)
$${I}_{A}=1-\left[\frac{{\sum }_{i=1}^{N}{\left({ASD}_{pred,i}-{ASD}_{meas,i}\right)}^{2}}{{\sum }_{i=1}^{N}{\left(\left|{ASD}_{pred,i}-\overline{{ASD }_{meas}}\right|+\left|{ASD}_{meas,i}-\overline{{ASD }_{meas}}\right|\right)}^{2}}\right]$$
(11)
$${U}_{95}=1.96\sqrt{{SD}^{2}+{RMSE}^{2}}$$
(12)

Here, \({ASD}_{meas,i}\) and \({ASD}_{pred,i}\) are measured and predicted abutment scour depth (ASD) values for ith data points, values; \(\overline{{ASD }_{meas}}\) and \(\overline{{ASD }_{pred}}\) are means of measured and predicted ASD values, \(SD\) is the standard deviation, and \(N\) is the total number of observations.

2.6 Model Development and Configuration

The data were normalized using the following equation to equalize the data scale:

$${x}_{normal}=\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}$$
(13)

where \({x}_{max}\) and \({x}_{min}\) denote the maximum and minimum values of the dataset used to generate the prediction models, respectively. Then 70% of the data for training and 30% of the data for evaluating the models' performance were considered test data sets.

The CatBoost model was used to predict scour depth around the bridge abutment in the present study. Two powerful models, including Extra Tree Regression (ETR) and KNN models, were used to compare the performance of the CatBoost model. The Gradient Boosting Decision Tree (GBDT) method was used to select the effective features in scouring prediction. Figure 4 shows the importance of each of the input variables in the scour estimation. According to Fig. 7, the \(y / L\) ratio is the most effective factor in predicting scouring, and Fd50 is the least important compared to other input variables. As a result, according to the feature selection results, two combinations, namely: comb1 (all variables (\(V/{V}_{c}\), \({F}_{d50}\), \(y/L\), \(L/{D}_{50}\))) and comb2 (all variables except Fd50 (\(V/{V}_{c}\), \(y/L\), \(L/{D}_{50}\))), were considered.

Fig. 4
figure 4

The importance of input variable based on GBDT feature selection

The proper adjustment of model parameters is one of the most important aspects of using machine learning models. Fine-tuning the parameters leads to higher accuracy. The grid search method was used to find the optimal value of machine learning model parameters. All models are implemented in a PC with an Intel-core i7-10750H 2.6 GHz processor and 32 GB of RAM. Extra Tree Regression (ETR) and KNN and the feature selection algorithm were developed in the Scikit-Learn package (Pedregosa et al. 2011), and the CatBoost package (Dorogush et al. 2018) in Python was used to implement the CatBoost model. In the CatBoost model, the four main learning rate (learning_rate), tree depth (depth), number of iterations (iterations), and L2 regularization (l2_leaf_reg) parameters must be set. In the present study, the range of learning rates [0.001–0.5], depth [2–20], iterations between [200–2000], and L2 regularization in the range [0.5–1.5] were considered. Table 4 presents the optimal parameters of the CatBoost model for the two input combinations, comb1, and comb2. In the Extra Tree model, there are two important parameters, the number of trees in the forest (n_estimators) and the maximum depth of the tree (max_depth), that need to be adjusted. The maximum depth of the tree ​​was considered between [2–20], and the number of trees in the forest was set between [10–150]. Table 4 presents the optimal parameters of the ETR model for both input combinations. The only parameter of the KNN model is the number of neighbors, which is considered between [1–10], and its optimal values are presented ​​in Table 4 for both input combinations. Figure 5 indicates the schematic flowchart of the present study.

Table 4 Model tuning parameters for abutment scour prediction
Fig. 5
figure 5

The flowchart of the present study

3 Results and Discussion

In this research, a comprehensive ML-based investigation was performed on the normalized scour depth (\({d}_{se}/L\)) of the abutments in uniform bed based on four dimensionless input features, namely \(V/{V}_{c}\), \({F}_{d50}\), \(y/L\), and \(L/{D}_{50}\). The feature selection process was addressed to determine the most significant candidate input combinations using a tree-based, namely the Tree decision FS method. The outcomes of pre-processing indicate that two scenarios comprised all features (Comb1) and all features except the \({F}_{d50}\)(Comb2) were considered to feed the ML model to model the normalized scour depth around abutments. Table 5 summarizes the goodness-of-fit statistics of the simulation of the normalized scour depth at the abutment. According to Table 5, the CatBoost model in Comb 1 (comprised of all features), regarding the best accuracy in training (R = 0.9993, RMSE = 0.0290, MAPE = 0.5069%, and U95% = 0.0569) and testing (R = 0.9685, RMSE = 0.1784, MAPE = 10.4724%, and U95% = 0.0612) outperformed the Extra Tree (R = 0.9491, RMSE = 0.2236, MAPE = 11.3301%, and U95% = 0.0979 for the testing phase) and KNN models (R = 0.9251, RMSE = 0.2778, MAPE = 18.4478%, and U95% = 0.1503 for the testing phase). Moreover, in the second variant of input combination (i.e., Comb 2), the CatBoost model owing to the highest values of R = 0.9608 and IA = 0.9790 and least diagnostic metrics (RMSE = 0.1995, MAPE = 12.2251%, and U95% = 0.0759) in the testing stage had the best performance among three considered models followed by Extra Tree (R = 0.9378, RMSE = 0.2482, MAPE = 12.4341%, and U95% = 0.1193), and KNN (R = 0.9122, RMSE = 0.2936, MAPE = 21.3313%, and U95% = 0.9545) models, respectively. Overall, regarding the goodness-of-fit statistics reported in Table 5, it can be concluded that the predictive performance of Comb 1, including all input features, is superior to Comb 2.

Table 5 Statistical metrics of the predicted abutment scour depths for Comb 1 and 2

For validation of the provided model to the estimation of the normalized scour depth at the abutment, several infographic tools, and diagnostic analyses were addressed in the forms of scatter plots, Rug-Histograms density distribution function, trend variation graphs, Taylor diagrams, and the violin plots of residual and relative deviation error.

Figure 6 demonstrates the scatter plots of Comb 1 to compare the predicted and measured values of normalized scour depth at the abutments. According to Fig. 6a, it can be seen that all three models in the training phase of Comb 1 have perfect performance regarding the best fitness with actual values of \({d}_{se}/L\). Although in the testing phase, the Catboost model representation with green color (R = 0.9685) led to the best agreement with actual values of \({d}_{se}/L\) compared with the Extra Tree (R = 0.9491) and KNN (R = 0.9251) approaches. Besides, in Comb 2, (Fig. 6b) the best compatibility between the predicted and measured \({d}_{se}/L\) is related to both ensemble-based ML models (CatBoost and Etra Tree) and KNN stands on the last rank of accurateness of simulation. A closer comparison of scatter plots indicates that the performance of all three methods in Comb 1 is better than Comb 2 due to the better alignment around the 45° line. In this stage of evaluating the capability of the models, the Taylor diagrams in Fig. 7 are employed to qualitatively assess the accuracy of the representatives of each model in comparison with the actual values, which criterion are the correlation coefficient and standard deviation (Xu et al. 2016). Based on Fig. 7, it can be seen that the representative of the CatBoost method for both combinations is located in the range of 0.95 to 0.99 from a smaller physical distance than other methods with a reference point. Although, the precision of Comb 1 for all three considered models is superior to Comb 2 for estimating the \({d}_{se}/L\) values. Figure 8 displaces the box plot of residual error in the testing stage, which reveals that the CatBoost models regarding the lowest residual error range in Comb 1 (1.08) and Comb 2 (1.37) yielded the most reliable outcomes than KNN (Comb 1|1.486 and Comb 2|1.575) and Extra Tree (Comb 1|1.634 and Comb 2|1.616). Overall, the diagnostic analyses evident that Comb 1 results in more accurate outcomes than Comb 2, and the Extra Tree as the second-best predictive model can be considered a reliable model for precise modeling of the \({d}_{se}/L\) values. Furthermore, the results presented in this research showed that the novel ensemble-based machine learning methods (CatBoost and Extra Tree) have a good performance in solving significant non-linear scour depth estimation problems at the hydraulic structures, which is also confirmed by previous research.

Fig. 6
figure 6

a Scatter plots of computed and observed values of dimensionless scour in comb 1 for the training and test data. b Scatter plots of computed and observed values of dimensionless scour in comb 2 for the training and test data

Fig. 7
figure 7

Taylor diagrams of models in combs 1 and 2

Fig. 8
figure 8

Box plots of residuals for different models in comb 1 and 2

3.1 Extra Discussion and Comparison

Here, the performance of provided ML models in the superior candidate input combination is examined for the prediction of the normalized scour depth at abutments. Figure 9 displaces the trend variation of predicted \({d}_{se}/L\) values using CatBoost, Extra Tree, and KNN versus the measured \({d}_{se}/L\). Best performance in capturing non-linear behavior of the scour depth data points (\({d}_{se}/L\)) in the testing phase is related to the CatBoost method, and the Extra Tree and KNN methods are in the next ranks, respectively. Also, the residual distribution indicates that the KNN model has the highest oscillation among the three ML approaches. Figure 10 depicted the Rug-Histogram of the predicted normalized scour depth (\({d}_{se}/L\)) obtained by the CatBoost, Extra Tree, and KNN models vs. the measured \({d}_{se}/L\) for the superior input combination for whole data points. Here, the density distribution function of the CatBoost model appeared to be more matched with the measured \({d}_{se}/L\) in comparison with the Extra Tree and KNN models. Also, the greater conformity of the compression band of the predicted and actual data in the CatBoost model ascertains a better performance than the other methods in estimating scour depth at abutments.

Fig. 9
figure 9

Comparison of results and residuals for different models in comb 1

Fig. 10
figure 10

Comparison of the prediction form of the Rug-Histograms density distribution function

3.2 Comparison of Previously Proposed Relationships and Present Approaches

A comparison has also been done using experimental and computed scour depths for previously proposed empirical relationships, viz. Melville and Coleman (Melville and Coleman 2000) and Dey and Barbhuiya (Dey and Barbhuiya 2005) and soft computing approaches viz. CatBoost, ETR, and KNN. Figure 11a illustrates scatter plots between observed and computed values of normalized maximum scour depth around the abutment with the error line bands of ± 20%. One can easily identify that novel ensemble-based data-intelligence paradigms, viz. CatBoost, ETR, and KNN predict abutment scour depth more precisely than empirical relationships. Melville and Coleman (Melville and Coleman 2000) equation overestimates the abutment scour depth and shows maximum error for used datasets, as can be seen in Fig. 11a. Figure 11b illustrates the variation between the error and total datasets. For CatBoost, ETR, and KNN, approximately 95% of the datasets were found to have less than ± 20% error than whereas more than 60% of datasets for Dey and Barbhuiya (Dey and Barbhuiya 2005) were found to have less than ± 30% error, while only 40% datasets for Melville and Coleman (Melville and Coleman 2000) were found to have less than ± 40% error, as can be seen in Fig. 11b.

Fig. 11
figure 11

a Comparison between observed and computed normalized abutment scour depth. b Variation of percentage error v/s total datasets

4 External Validation

Tropsha et al. (Tropsha et al. 2003) established new criteria for model external validation. These criteria are derived from the known prediction performance of the model. The validation criteria and supporting data for the suggested models are summarized in Table 6. K and K' must be between 0.85 and 1.15, and m and n must be smaller than 0.1 to meet the requirements. For the test dataset, CatBoost's correlation coefficient (R) is 0.968, while for the training dataset, it is 0.999. CatBoost's m and n coefficients (n = -0.064 and m = -0.061 for the test dataset) are greater than those of the other models. As seen in Table 6, all three models in the training and testing datasets met all important criteria, demonstrating that the models' predictive quality and significant connection between goal and output are not coincidental.

Table 6 External validation parameters for the created models

5 Conclusion

The purpose of this work was to undertake a complete machine learning analysis to predict the scour depth around the bridge abutment. Three machine learning models were created for this purpose: CatBoost, Extra Tree Regression (ETR), and K-nearest neighbor (KNN). The models were developed using 308 samples series of laboratory data (a wide range of existing abutment scour depth datasets (263 datasets) and 45 flume experiments data at the NIT Warangal, India). Four dimensionless parameters including upstream densimetric Froude number (Fd50), the upstream depth (y) to abutment transverse length ratio (y/L), the abutment transverse length to the sediment mean diameter (\(L/d50\)), and the mean velocity to the critical velocity ratio (\(V/Vc\)) were considered as the model inputs, and the normalized scour depths (\({d}_{se}/L\)). Based on the GBDT feature selection method, two combinations: comb1 (\(V/{V}_{c}\),\({F}_{d50}\),\(y/L\), \(L/{D}_{50}\)) and comb2 (\(V/{V}_{c}\),,\(y/L\), \(L/{D}_{50}\)), were selected as input for models. The results of this study showed that the use of input combination 1 (comb1), which includes all input variables, provided more accurate results. Comb1 has provided better results in the test phase in all three models used. The CatBoost model performed best in predicting scour depth in both input combinations 1 and 2 (RMSE = 0.1784 and R = 0.9685 for comb1 and RMSE = 0.1995 and R = 0.9608 for comb2). In the ET model, input combination 1 (RMSE = 0.2236 and R = 0.9491) also performed better than input combination 2 (RMSE = 0.2482 and R = 0.9378). The ET model performed worse than the CatBoost. The KNN model has the weakest results among the models used (RMSE = 0.2778 and R = 0.9251 for comb1 and RMSE = 0.2936 and R = 0.9122 for comb2). Additionally, a comparison of purposed intelligent models to prior empirically-based research demonstrates the superiority of all established machine learning models. Finally, external validation established that all prediction techniques were constrained to values of\(0.85<K, {K}^{^{\prime}}<1.15\), and\((m, n)< 0.1\). The performance of the created model on test data reveals its ability to generalize effectively.