1 Introduction

Occasionally, pipelines are utilized to convey fluids such as water, petroleum, and gas. These pipelines are fundamentally imbedded in a cross-section of river. The pipelines may deal with partial erosion in its surrounding due to approaching current-induced oscillation by wake-vortex shedding when flood take places (Fig. 1). Hence, predicting the scour depth around pipelines is significant problem for hydraulic engineering [13, 15, 40]. Several investigators have been carried out experimental and numerical studies for prediction of scour below pipelines (e.g., [8, 1012, 26, 27, 33, 34, 3638]). Numerous empirical equations were obtained through previous investigations. Substantial fault of these approaches is that the traditional methods have not accurate enough to predict scour phenomena.

Fig. 1
figure 1

Scour process below the pipeline [15]

Hence, artificial intelligence approaches were used widely for evaluation of hydraulic and hydrological problems. For instance, scouring around hydraulic structures have been predicted by artificial neural networks (ANNs), machine learning approach, adaptive neuro-fuzzy inference systems (ANFIS), genetic programming (GP), and linear genetic programming (LGP) (e.g., [27, 18, 2025, 32, 54]). Recently, the GMDH networks with their combinations are used to predict scour depth around piers and abutments bridge [43, 45, 46]). Results of the performances indicated that GMDH networks can be predicted well complexity of scour process than that of empirical equations. Also, GMDH networks were utilized to solve different problems in engineering fields (e.g., [1, 31, 39, 47, 50, 52, 56]). The main objective of the paper is that efficiency of the GMDH network is investigated to predict the pipeline scour depth. Also, the results of performances would be compared with those obtained using the SVM and empirical equations.

2 Analysis of affecting parameters on scour depth below pipeline

Scour phenomenon fundamentally takes place in two main flow conditions, that is, clear-water and live-bed (e.g., [30, 41, 51]). In this way, mechanism of scour process in clear-water is very distinctive from that of live-bed condition. Dey and Singh [15] (125 data sets), Moncada-M and Aguirre-Pe [40] (90 data sets) have carried out experiments in clear-water and live-bed conditions, respectively. From these experiments, they suggested effective parameters on the scour depth below pipelines in form of following function:

$$ d_{s} = f(U,y,\rho ,\rho_{s} ,\mu ,S_{0} ,B,d_{50} ,D,e,g) $$
(1)

where d s , U, y, ρ, ρ s , μ, S 0, B,d 50, D, e, g are scour depth, mean flow velocity, normal flow depth, density of water, density of sediment, dynamic viscosity of water, the slope of bed channel, channel width, diameter of bed material, diameter of the pipe, gap between the pipe and the originally undisturbed bed, and the acceleration due to gravity, respectively.

Using Buckingham’s theorem, eight independent nondimensional parameters have been resulted as follows:

$$ d_{s} /D = f(Fr,\text{Re} ,\tau^{ * } ,y/D,D/d_{50} ,e/D,S_{0} ,y/B) $$
(2)

in which Fr, Re, and τ* are the Froude number, the Reynolds number, and nondimensional Shields parameter due to sediment transport, respectively.

$$ Fr = U/\sqrt {g.y} $$
(3)
$$ \text{Re} = UD/\upsilon $$
(4)
$$ \tau^{ * } = u_{ * }^{2} /g.(\rho_{s} /\rho - 1).d_{50} $$
(5)

Moncada-M and Aguirre-Pe [40] concluded that influences of the y/B on the scour depth can be neglected for very wide channels. Also, the slope of bed channel, S 0, was considered as a constant parameter through their experiments. Therefore, it has not any effect on the scour depth. Dey and Singh [15], Moncada-M and Aguirre-Pe [40] have investigated effects of Reynolds number \( (uD/\upsilon ) \) on the pipeline scour depth. From their experiments, it was found that the Reynolds number is between 8 × 103 and 30 × 103, and it does not exert any conspicuous influence on the scour depth [44]. Similar experimental results were obtained for pier scour depth by [14, 16, 17, 42].

In addition, initial gap between pipe and undisturbed erodible bed, e, has been presented in two status [15, 40]. The e parameter was reported as zero in Dey and Singh [15] experiments. Also, e value was varied between 0 and 10 mm within Moncada-M and Aguirre-Pe’s experiments.

Based on mentioned discussions, two following functions were presented for both clear-water and live-bed conditions:For clear-water conditions [15]

$$ d_{s} /D = f(Fr,\tau^{ * } ,y/D,D/d_{50} ) $$
(6)

For live-bed conditions [40]

$$ d_{s} /D = f(e/D,Fr,\tau^{ * } ,y/D,D/d_{50} ) $$
(7)

Furthermore, discussion of effective parameters on the pipeline scour depth was detailed in the literature [44]. The ranges of data sets are presented in Table 1. The two data sets related to the live-bed and clear-water conditions were used for modeling of the pipeline scour depth. For the two main flow conditions, about 75 % of data sets were conditions, selected randomly for training, whereas the remaining 25 % were used for testing stage.

Table 1 Ranges of used data sets for clear-water and live-bed conditions

Also, different empirical equations were obtained in both clear-water and live-bed conditions by several investigations. In this way, the following empirical equations were used to predict the pipeline scour depth:

$$ d_{s} /D = 4.706(U_{0} /U_{C} )^{0.89} (U_{0} /gy)^{1.48} + 0.06 $$
(8)
$$ d_{s} /D = 0.9\tanh (1.4Fr) + 0.55 $$
(9)
$$ d_{s} /D = 2\sec h(1.7e/D) $$
(10)

Equation (8) was drawn from the Ibrahim and Nalluri [27] experiments for clear-water conditions. In addition, Eqs. (9) and (10) were proposed by Moncada-M and Aguirre-Pe [40] for clear-water experiments.

3 Group method of data handling (GMDH) model

GMDH is a learning machine based on the principle of heuristic self-organizing, proposed by Ivakhnenko in the 1960s. It is an evolutionary computation technique, which has a series of operations such as seeding, rearing, crossbreeding, and selection and rejection of seeds correspond to determination of the input variables, structure and parameters of model, and selection of model by principle of termination [1, 29].

In fact, the GMDH network is a very flexible algorithm, and it can be hybridized by using evolutionary and iterative algorithms such as genetic algorithm (GA) [1, 39], GP [28, 43], particle swarm optimization (PSO) [48], and back propagations [43, 45, 50, 52]. Previous researches established that hybridizations were successful in finding solutions of problems in different fields of engineering.

By means of GMDH algorithm, a model can be represented as set of neurons in which different pairs of them in each layer are connected through quadratic polynomial and thus produce new neurons in the next layer. Such representation can be used in modeling to map inputs to outputs. The formal dentition of system identification problem is to find a function \( \hat{f} \) that can be approximately used instead of actual function f, in order to predict the output \( \hat{y} \) for a given input vector X = (x 1, x 2, x 3,…, x n ) as close as possible to its actual output y. Therefore, given n observation of multi-input–single-output data pairs so that

$$ y_{i} = f(x_{i1} ,x_{i2} ,x_{i3} , \ldots ,x_{in} )\quad \left( {i = 1, 2, \ldots ,M} \right) $$
(11)

It is now possible to train a GMDH network to predict the output values \( \hat{y}_{i}^{{}} \) for any given input vector X = (x i1, x i2, x i3,…, x n ), that is:

$$ \hat{y}_{i} = \hat{f}(x_{i1} ,x_{i2} ,x_{i3} , \ldots ,x_{in} )\quad \left( {i = 1, 2, \ldots M} \right) $$
(12)

In order to solve this problem, GMDH builds the general relationship between output and input variables in the form of mathematical description, which is also called reference.

The problem is now to determine a GMDH network so that the square of difference between the actual output and the predicted one is minimized, that is:

$$ \sum\limits_{i = 1}^{M} {\left[ {\hat{f}\left( {x_{i1} ,x_{i2} ,x_{i3} , \ldots ,x_{in} } \right) - y_{i} } \right]}^{2} \to \hbox{min} . $$
(13)

General connection between inputs and output variables can be expressed by a complicated discrete form of the Volterra function a series in the form of:

$$ y = a_{0} + \sum\limits_{i = 1}^{n} {a_{i} x_{i} + } \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {a_{ij} x_{i} x_{j} } } + \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {\sum\limits_{k = 1}^{n} {a_{ijk} x_{i} x_{j} x_{k} } } } + \ldots , $$
(14)

which is known as the Kolmogorov–Gabor polynomial [1, 19, 29, 35, 49].

The polynomial order of PDs is the same in each layer of the network. In this scenario, the order of the polynomial of each neuron (PN) is maintained the same across the entire network. For example, assume that the polynomials of the PNs located at the first layer are those of the second order (quadratic):

$$ \hat{y} = G(x_{i} ,x_{j} ) = a_{0} + a_{1} x_{i} + a_{2} x_{j} + a_{3} x_{i} x_{j} + a_{4} x_{i}^{2} + a_{5} x_{j}^{2} $$
(15)

Here, all polynomials of the neurons of each layer of the network are the same, and the design of the network is based on the same procedure.

The weighting coefficients in Eq. (15) are calculated using regression techniques [1, 19] so that the difference between actual output, \( y \), and the calculated one, \( \hat{y} \), for each pair of x i , x j as input variables is minimized. In this way, the weighting coefficients of quadratic function G i are obtained to optimally fit the output in the whole set of input–output data pair, that is, as follows:

$$ E = \frac{{\sum\nolimits_{i = 1}^{M} {\left( {y_{i} - G_{i} ()} \right)^{2} } }}{M} \to \hbox{min} . $$
(16)

3.1 Application of BP algorithm in the topology design of GMDH network

In this section, the GMDH network was improved using back propagation algorithm. This method included two main steps. The first, the weighting coefficients of quadratic polynomial were determined using least square method from input layer to output layer in form of forward path. The second, weighting coefficients were updated using back propagation algorithm in a backward path. Again, this mechanism could be continued until the error of training network (E) was minimized. The other details of training stages were presented in the literatures [43, 50].

Furthermore, from the GMDH-BP network, corresponding polynomials for the live-bed, and clear-water conditions are as follows:

For live-bed condition

$$ (d_{s} /D)_{1}^{1} = 0.757 - 0.617e/D + 0.0225D/d_{50} + 0.00515e/D.D/d_{50} - 0.3506(e/D)^{2} - 0.00032(D/d_{50} )^{2} $$
(17)
$$ (d_{s} /D)_{2}^{1} = 1.83 - 0.0036D/d_{50} - 0.456D/y - 0.00575D/d_{50} .D/y - 0.00016(D/d_{50} )^{2} + 0.015(D/y)^{2} $$
(18)
$$ (d_{s} /D)_{3}^{1} = 1.287 - 0.352D/y + 2.1627\tau^{ * } - 0.2676D/y.\tau^{ * } + 0.0165(D/y)^{2} - 4.049\tau^{ * 2} $$
(19)
$$ (d_{s} /D)_{8}^{1} = 0.575 - 1.149e/D + 3.667\tau^{ * } + 0.545e/D.\tau^{ * 2} + 0.3084(e/D)^{2} - 5.237\tau^{ * 2} $$
(20)
$$ \begin{gathered} (d_{s} /D)_{2}^{2} = - 0.362 + 0.9441(d_{s} /D)_{1}^{1} + 0.7054(d{}_{s}/D)_{3}^{1} + 0.3604(d_{s} /D)_{1}^{1} .(d_{s} /D)_{3}^{1} - 0.3981((d_{s} /D)_{1}^{1} )^{2} - 0.20835((d_{s} /D)_{3}^{1} )^{2} \hfill \\ \hfill \\ \end{gathered} $$
(21)
$$ \begin{gathered} (d_{s} /D)_{5}^{2} = - 0.119 - 0.0266(d_{s} /D)_{2}^{1} + 1.0267(d{}_{s}/D)_{8}^{1} - 0.636(d_{s} /D)_{8}^{1} .(d_{s} /D)_{2}^{1} + 0.64((d_{s} /D)_{2}^{1} )^{2} + 0.115((d_{s} /D)_{8}^{1} )^{2} \hfill \\ \hfill \\ \end{gathered} $$
(22)
$$ \begin{gathered} (d_{s} /D)_{1}^{3} = - 0.00057 + 3.098(d_{s} /D)_{2}^{2} - 2.0138(d{}_{s}/D)_{5}^{2} - 8.665(d_{s} /D)_{5}^{2} .(d_{s} /D)_{2}^{2} + 2.1((d_{s} /D)_{2}^{2} )^{2} + 6.459((d_{s} /D)_{5}^{2} )^{2} \hfill \\ \hfill \\ \end{gathered} $$
(23)

and for clear-water conditions:

$$ (d_{s} /D)_{1}^{1} = 0.246 - 0.00026D/d_{50} + 0.464y/D + 0.00081y/D.D/d_{50} - 3.64 \times 10^{ - 5} (D/d_{50} )^{2} - 0.053(y/D)^{2} $$
(24)
$$ (d_{s} /D)_{3}^{1} = - 1.08 + 67.78\tau^{ * } + 4.11Fr - - 60.966\tau^{ * } .Fr + 584.55(\tau^{ * } )^{2} - 8.485(Fr)^{2} $$
(25)
$$ (d_{s} /D)_{4}^{1} = 5.94 - 0.062D/d_{50} - 269.746\tau^{ * } + 2.255D/d_{50} .\tau^{ * } + 2.05 \times 10^{ - 5} (D/d_{50} )^{2} + 3565.247\tau^{ * 2} $$
(26)
$$ (d_{s} /D)_{1}^{2} = 2.15 - 0.977(d_{s} /D)_{1}^{1} - 2.295(d{}_{s}/D)_{3}^{1} + 1.05(d_{s} /D)_{1}^{1} .(d_{s} /D)_{3}^{1} + 0.2149((d_{s} /D)_{1}^{1} )^{2} + 0.785((d_{s} /D)_{3}^{1} )^{2} $$
(27)
$$ (d_{s} /D)_{2}^{2} = 6.24 - 4.095(d_{s} /D)_{2}^{1} - 6.684(d{}_{s}/D)_{4}^{1} + 6.166(d_{s} /D)_{4}^{1} .(d_{s} /D)_{2}^{1} - 0.856((d_{s} /D)_{2}^{1} )^{2} + 0.1977((d_{s} /D)_{4}^{1} )^{2} $$
(28)
$$ (d_{s} /D)_{1}^{3} = 0.2 + 0.83(d_{s} /D)_{1}^{2} - 0.253(d{}_{s}/D)_{2}^{2} + 1.113(d_{s} /D)_{1}^{2} .(d_{s} /D)_{2}^{2} - 0.679((d_{s} /D)_{1}^{2} )^{2} - 0.22((d_{s} /D)_{2}^{2} )^{2} $$
(29)

The superscript and subscript of each parameter present the number of pertaining layer and neuron, respectively.

4 Support vector machines (SVM)

Support vector machines, like ANNs, are a kind of data-mining approach. SVM have been successfully applied to a number of applications ranging from particle identification, facial identification, and text categorization to engine knock detection, bioinformatics, and database marketing. The classification problem is used to investigate the basic concepts behind SVM and to examine their strengths and weaknesses from a data-mining perspective [9]. Regression algorithms of SVM are achieved by some modification to the classification algorithms of SVM. To develop SVM for each process, two main parameters of SVM namely regularization parameter (C) and the type of kernel (polynomial or Gaussian radial basis function) should be determined. In this study, the radial basis function kernel was used to minimize training error for both scour data set conditions. The regularization parameter, C, and the size of error in sensitive zone parameters control the complexity of prediction. The other details of SVM algorithm are presented in the literatures [53, 55].

5 Results and discussion

The statistical results of GMDH networks for both live-bed and clear-water conditions were presented in this section. In addition, the performance results were compared with those obtained using the SVM model and empirical equations. Correlation coefficient (R), root mean square error (RMSE), and mean absolute percentage of error (MAPE) are commonly used indicator of errors prediction in testing stage [27, 22, 23, 43].

For clear-water condition, testing results of for the GMDH-BP, and the SVM model are given in Table 2. It was found that GMDH-BP predicted the scour depth with lower error (RMSE = 0.077 and MAPE = 0.87) and higher accuracy (R = 0.96) than those resulted using the SVM model (R = 0.93, RMSE = 0.23, and MAPE = 0.6). Also, statistical results of empirical equation indicated that Eq. (8) produced with remarkably higher error of scour prediction (RMSE = 0.9 and MAPE = 1.96) and lower coefficient of correlation (R = 0.22), compared with the GMDH-BP and SVM model (Fig. 2).

Table 2 Statistical results of performances for both flow conditions
Fig. 2
figure 2

Scatter plot of observed and predicted scour depth using the GMDH-BP, SVM models, and empirical equation for live-bed condition

For live-bed conditions, the performance results of proposed artificial intelligence approaches indicated that GMDH-BP predicted the scour depth with lower error (RMSE = 0.161 and MAPE = 0.81) and higher accuracy (R = 0.97) than those resulted using the SVM model (R = 0.95, RMSE = 0.14, and MAPE = 0.63). From Table 2, it can be said that Eq. (9) produced with relatively higher error (RMSE = 0.46 and MAPE = 1.57) and lower coefficient of correlation (R = 0.31), compared with the Eq. (10).

In fact, Eqs. (9) and (10) included those of the Fr and e/D which are limiting factors for pipeline scour prediction. Also, Eq. (10) predicted the scour depth more accurately than Eqs. (8) and (9). For live-bed and clear-water flow conditions, scatter plot between predicted and observed scour depth values for testing stages have been illustrated in Figs. (3, 4), respectively. Furthermore, the GMDH-BP provided lower error of scour prediction (RMSE = 0.073 and MAPE = 0.197) for clear-water condition than that of live-bed condition (RMSE = 0.161 and MAPE = 0.81).

Fig. 3
figure 3

Scatter plot of observed and predicted scour depth using the GMDH-BP, SVM model, and empirical equations for clear-water condition

Fig. 4
figure 4

Modeling of pipe position on scour depth for live-bed condition

To clarify the new contributions of this study, efficiency of GMDH-BP was carried out to investigate the effects of d s /D on y/D for different d 50 values (0.48, 0.81, 1.86, 2.54, and 3 mm). For clear-water condition, the contribution results indicated that GMDH-BP predicted the scour depth below pipeline in 0.48 mm of d 50 with the lower error (MAPE = 0.6) than the other performances. From Table 2, it can be noted that the GMDH-BP provided more accurate prediction of scour depth in fine sediment size than that of coarse sediment sizes.

For live-bed conditions, robustness of the GMDH-BP was performed to investigate effects of e/D on the d s /D. In this way, variations of d s /D versus e/D for different ranges of Fr values were shown in Fig. 4. The statistical results indicated that GMDH-BP predicted the scour depth in 0.2–0.4 range of Fr with lower error (MAPE = 1.9) than those of Fr ranges (Table 3).

Table 3 Effects of input parameters on GMDH-BP performances for both clear-water and live-bed conditions

6 Conclusions

In this study, the scour depth below pipeline in clear-water and live-bed conditions predicted by using the GMDH-BP, SVM model, and empirical equations. Several effective parameters on the scour depth were determined using dimensional analysis. Two functions were defined to develop GMDH network for clear-water and live-bed conditions. Weighting coefficients of quadratic polynomials of the GMDH network were trained using the back propagation algorithm. Performance results indicated that GMDH network predicted the scour depth with relatively lower error and high accuracy (R = 0.967, RMSE = 0.073, and MAPE = 0.197) for both clear-water and live-bed conditions, compare to the SVM model. For clear-water condition, robustness of proposed GMDH-BP showed that it can be resulted more accurate scour prediction (MAPE = 0.6) in fine sediment size than that of coarse sediment sizes. Furthermore, the GMDH-BP predicted variations of d s /D versus e/D for 0.2–0.4 range of Fr with relatively lower error (MAPE = 1.9), compared to the other ranges of Fr. In general, application of the GMDH network to investigate the pipeline scour depth was proven that this algorithm can be used well for predicting the complexity and physical behavior of scour process below pipeline.