1 Introduction

In rivers, hydrological measurements such as the discharge, velocity, turbulence, and depth are essential for the design and implementation of river training works, calculating pollutant fluxes, interpretation of water quality data and for water resource management. The simplest of the several methods of the river flow discharge prediction is the Manning’s equation which, although developed for conditions of uniform flow in open channels, may give an adequate estimate of the nonuniform and unsteady flow which is usual in natural channels, provided that no significant flood occurs in such a way the river overflows its banks.

In time of extreme flood, inundation of floodplains provides high-velocity gradients across the river. This is owing to the fact that in general, floodplains have higher roughness coefficient than the main channel. In such a condition, the usual practice is to divide the cross-section of the river into floodplains and main channel as shown in Fig. 1 and calculate discharge at each of these sections using Manning’s equation. Then, the total discharge will be computed by adding these section’s discharges. This conventional procedure known as the vertical divided channel method (VDCM) does not include the large apparent shear stress induced at the main channel–floodplain interfaces resulting from high-velocity lateral gradient. Ignoring this apparent shear stress, which in some cases is may be greater than the bed shear stress, causes the computed discharge to be much greater than the actual discharge. Field and laboratory experiments conducted by Martin and Myers [30] and Lai and Bessaih [24] indicated that the maximum error caused by VDCM may be up to 40 and 60 %, respectively.

Fig. 1
figure 1

Three-dimensional flow structures in a compound channel [38]

There are many studies on the flow characteristics of compound channels, especially the flow conveying capacity. Of these many investigations, works of SKM [38, 39], COH [1], and EDM [7] have very good accuracy and hence very wide applications in flow discharge computations of compound channels. However, these methods are not straightforward to be generally applicable and may suffer from long computational time [12, 13, 14, 15], requiring numerical methods or solution of differential equations. To simplify the computations of overbank hydraulics, in recent years, some simple and new approaches have been proposed which lead to accurate results [17]. MacLeod [29] and Liu and James [25] used the neural networks for flow discharge prediction in meandering compound channels. Sharifi [36] and Sharifi et al. [37] applied genetic algorithm for conveyance estimation in simple and compound channels. Zahiri and Dehghani [47] and Unal et al. [45] used artificial neural networks for discharge prediction in straight compound channels. Azamathulla and Zahiri [2] applied linear genetic programming for prediction of flow discharge in compound open channels.

In this paper, based on sufficient stage-discharge data sets with high quality from 30 field and laboratory compound channels, two simple and more accurate algorithms known as linear genetic programming (LGP) and M5 tree decision models have been proposed for flow conveyance capacity. Using these two precise methods, a dimensionless equation and linear set of equations have been developed for flow discharge computation. Besides the fast computing and high accuracy of the new algorithms, the promising point of this study which certainly leads to general applicability of the results is selecting only 3 input parameters in LGP and M5 models.

2 Materials and methods

2.1 Flow hydraulic structures in compound channels

Compound channels generally include a deep main channel and two flat floodplains (Fig. 1). These sections are very useful and important for river environment and ecosystem and also for efficient flood deviation. The flow mechanisms in straight compound channels are now well understood [23]. The fast flow in the main channel is retarded by the slower flow on the floodplains, causing lateral momentum transfer. The shear layer that develops at the interface of the main channel and the floodplain affects turbulence structures and leads to stream-wise and vertical vortices development (Fig. 1). These effects have been observed experimentally by Shiono and Knight [38] and Tominaga and Nezu [44] using fiber-optic laser-Doppler anemometer (LDA).

2.1.1 Vertical divided channel method

In Fig. 2, a typical prismatic compound section with three subdivided sections separated by three methods (e.g., horizontal, vertical, and diagonal) is shown. Total flow discharge is the sum of discharges calculated separately in each subsection using an appropriate conventional friction formula, for example, Manning’s equation [10]:

$$ Q_{\text{VDCM}} = \sum\limits_{i = 1}^{N} {Q_{i} } = \sum\limits_{i = 1}^{N} {\frac{{A_{i} R_{i}^{2/3} S_{0}^{1/2} }}{{n_{i} }}} $$
(1)

where Q VDCM is total flow discharge in compound channel, A is area, R is hydraulic radius, S 0 is bed slope, and n is Manning’s roughness coefficient. In this equation, i refers to each subsection (main channel or floodplains), and N is total number of subsections. The vertical divided channel method is the most applicable method in engineering tools and softwares (e.g., HEC-RAS, MIKE11, ISIS, SOBEC, etc.). This method has great over-prediction error for discharge estimation in field and laboratory compound sections [1].

Fig. 2
figure 2

Analysis of a typical compound channel by vertical dividing

2.1.2 Coherence parameter

For modifying VDCM, Ackers [1] introduced a semi-empirical method based on the dimensional analysis and using large experimental data of FCF. In this method, a correction factor named as discharge adjustment factor (DISADF) is defined and then multiplied by Q VDCM to obtain a flow discharge as close as actual discharge. In the case of homogeneous compound channels, variations of this correction factor with respect to depth ratio consist of four distinct regions. An example of this flow classification is shown in Fig. 3. For each region, some special equations are defined and then, based on some conditional criteria, the appropriate region will be selected. The main concept of Ackers method relies on coherence parameter. Coherence is defined as the ratio of the basic conveyance (treating the compound channel as a single unit) to that computed by VDCM. By this definition, the following equation can be used to calculate coherence parameter:

$$ {\text{COH}} = \frac{{A^{5/3} /\left( {\sum\nolimits_{i = 1}^{N} {n_{i}^{1.5} P_{i} } } \right)^{2/3} }}{{\sum\nolimits_{i = 1}^{N} {\frac{1}{{n_{i} }}\frac{{A_{i}^{5/3} }}{{P_{i}^{2/3} }}} }} $$
(2)

where P i and A i are the wetted perimeter and the area of subsections, respectively. P and A are total wetted perimeter and area of the channel, and N is the number of separate subsections.

Fig. 3
figure 3

Variations of flow discharge adjustment factor versus depth ration in a homogenous compound channel

2.2 A review of linear genetic programming

Genetic programming (GP) is a methodology for automatically solving problems using computers and was inspired by biological evolution [19]. GP is a type of genetic algorithm. It is a pattern for learning the most “best fit” computer programs by means of artificial evolution [2]. The GP and GA methods are similar in most aspects: both initialise a population and compound the random members known as chromosomes (individual). Then, the fitness of each chromosome is evaluated with respect to a final amount. Afterward, to reproduce “fitter” programs, the concept of Darwinian natural selection is used. The main difference between GP and the GA is the representation of the solution. GP creates unequal and equal computer programs, which consist of several mathematical operators and variable sets as the solution. However, the GA creates equal-length chromosomes (strings of numbers) in the form of binary or real numbers, which represent the solution. The system can be composed of function calls (such as e x, x, sin, cos, tan, log, sqrt, ln, power) and mathematical operations (+, −, /, *). Each function implicitly includes an assignment to a variable, which facilitates the use of multiple program outputs in GP, whereas, in tree-based GP, those side effects must be incorporated explicitly [9].

LGP (linear genetic programming), which is an extension to conventional tree-based GP, evolves developing sequences of instructions from an imperative programming language (C or C++) or from a machine language clarify. The name “linear” refers to the structure of the (imperative) program representation and does not reflect functional genetic programs that are restricted to a linear list of nodes only. On the contrary, genetic programs normally represent highly nonlinear solutions [8]. The main differences to conventional tree-based GP are the graph-based data flow that results from a multiple usage of indexed variable (register) contents and the existence of structurally ineffective code (introns) [8]. This concept was expanded to the Automatic Induction of Machine code by Genetic Programming Technique (AIMGP), in which the solutions are directly computed, as binary machine codes and executed without using an interpreter; thus, in this way, the computer program can evolve very quickly [2, 9].

2.3 M5 tree model

Successful applications of machine learning techniques and especially model decision tree in civil engineering, water resources, and flood management [3, 4, 11, 28, 33, 35, 4042] have inspired the exploration of its applicability to modeling the complex flow discharge curve in compound open channels. Although M5 model tree [34]–based modeling is not as popular as ANN, GA, and GP, it has been proved to be very efficient and robust in many fields of science and engineering applications [40]. In this model, based on the most effective input variables, data are classified into some distinct groups and then, a set of linear regression equations are proposed to estimate the output variable for each group. Easy of computations and accuracy of predicted results are the most important advantages of M5 model. One of the main objectives of this study is to use M5 model tree for developing stage-discharge curve for flume and river compound channels. The predictive accuracy of this model is compared with VDCM and GP.

A complex modeling problem can be solved by dividing it into a number of subproblem tasks and combining the solutions of these subproblems. This idea is used in building model trees (MT) in the following way: The parameter space was split into areas (subspaces) and a linear regression model was built in each of them, which is an “expert” for that subspace. In fact, the resulting model can be seen as a modular model, or a committee machine [18], with the linear models being specialized on the particular subsets of the input data set. The algorithm used to build MT is called M5 [34] and is based on the principle of information theory. The algorithm makes it possible to split the multidimensional parameter space into subspaces and to generate the models automatically for each subspace according to an overall quality criterion.

The construction of a model tree is similar to that of decision trees. Figure 4 illustrates how the splitting of space is done. First, the initial tree is built and then the initial tree is pruned (reduced) to overcome the over-fitting problem (that is a problem when a model is very accurate on the training data set and fails on the test set). Finally, the smoothing process is employed to compensate for the sharp discontinuities between adjacent linear models at the leaves of the pruned tree (this operation is not needed in building the decision tree).

Fig. 4
figure 4

Splitting the input space X1 × X2 by M5 model tree algorithm (left) and its mathematical definition (right). Models 1–6 are linear equations [41]

The splitting criterion for the M5 model tree algorithm is based on treating the standard deviation of the class values that reach a node as a measure of the error at that node and calculating the expected reduction in this error as a result of testing each attribute at that node. The formula to compute the standard deviation reduction (SDR) is as follows:

$$ {\text{SDR}} = {\text{sd}}(T) - \sum {\frac{{\left| {T_{i} } \right|}}{\left| T \right|}{\text{sd}}(T_{i} )} $$
(3)

where T represents a set of examples that reaches the node; T i represents the subset of examples that have the ith outcome of the potential set; and sd represents the standard deviation. After examining all the possible splits, M5 chooses the one that maximizes the expected error reduction. Splitting in M5 terminates when the class values of all the instances that reach a node vary just slightly, or only a few instances remain. This division often produces a large tree like structure that must be pruned back, for instance by replacing a subtree with a leaf. In the final stage, a smoothing process is performed to compensate for the sharp discontinuities that will inevitably occur between adjacent linear models at the leaves of the pruned tree, particularly for some models constructed from a smaller number of training examples. In smoothing, the adjacent linear equations are updated in such a way that the predicted outputs for the neighboring input vectors corresponding to the different equations are becoming close in value [46].

2.4 Data set

Three hundred and ninety-four flume and field data sets of flow rating curves from 30 different straight compound sections were selected for this study. Most of these data are collected form experimental works carried out by HR Wallingford (FCF) in compound channel flumes with large-scale facility [21]. Also, some experimental data from Blalock and Sturm [5], Knight and Demetriou [20], Lambert and Sellin [25], Myers and Lyness [31], Lambert and Myers [26], Bousmar and Zech [7], Haidera and Valentine [16], Lai and Bessaih [24], and Bousmar et al. [6] have been used. Field data were collected from natural rivers including River Severn at Montford bridge [1, 22], River Main [30], and Rio Colorado [43]. The cross-section of a typical river compound channel with berm inclination is shown in Fig. 5. The ranges of geometric and hydraulic characteristics of compound channels used in this paper are listed in Table 1.

Fig. 5
figure 5

Typical river compound channel cross-section with berm inclination

Table 1 Range of geometric and hydraulic variables of compound channels

2.5 Selection of input and output variables

In this paper, it is assumed, somewhat similar to Ackers [1] approach, that discharge ratio in compound open channels is dependent on three input dimensionless parameters including depth ratio (floodplain depth to main channel depth), coherence parameter, and calculated discharge ratio (Q VDCM to bankfull discharge):

$$ \frac{{Q_{t} }}{{Q_{b} }} = f\left( {{\text{Dr,COH}},\frac{{Q_{\text{VDCM}} }}{{Q_{b} }}} \right) $$
(4)

where Q t is total flow discharge, Q b is bankfull discharge, Dr is depth ratio (ratio of water depth in floodplain to that of main channel), and Q VDCM is flow discharge calculated from Manning’s equation assuming vertical divided planes between main channel and floodplains. The actual flow discharge ratio (actual discharge to bankfull discharge) is selected as output parameter. Of the total data set, approximately 70 % (282 sets) were selected randomly and used for training LGP and M5 models. The remaining 30 % (112 sets) were employed for testing.

3 Results and discussion

3.1 Experimental setup

For the model tree experiment, a model tree was built using the M5 algorithm implemented in WEKA software [46].

3.2 LGP model

In this research, a two-point string crossover was utilized in the LGP. A segment of random length and random position is used in both parents and exchanged between parents. The crossover is abandoned and restarted by exchanging equalized segments when one of the resulting children exceeds the maximum length. The instruction operator is modified by mutation into another symbol over the same set [9, 12]. The following equation is used to compute the fitness of an LGP individual:

$$ f = \sum\limits_{j = 1}^{N} {\left( {\left| {X_{j} - Y_{j} } \right|} \right)} $$
(5)

where Y j is the expected value for fitness case j and X j is the value returned by a chromosome for fitness case j.

In LGP, to avoid overgrowing programs, the maximum size of the program is generally restricted [9]. This configuration was tested for the proposed LGP model and was found to be sufficient. The best individual (program) of a trained LGP can be converted into a functional representation by successive replacements of variables starting with the last effective instruction [32]. In this paper, one basic mathematical function (power) and three basic arithmetic operators (+, −, /) and a large number of generations (5,000) were used for testing. First, the maximum size of each program was specified as 256, starting with 64 instructions for the initial program. Table 2 shows the operational parameters and functional set used in the LGP modeling.

Table 2 Parameters of the optimized LGP model

Using optimization procedure, following relationship has been obtained for training data:

$$ \frac{{Q_{t} }}{{Q_{b} }} = 3.9538{\text{Dr}} \times 0.457^{{{\text{Dr}} + \left( {1 - {\text{COH}}} \right)^{\text{Dr}} }} + \frac{{Q_{\text{DCM}} }}{{Q_{b} }}^{{\left( {1 - {\text{COH}}} \right)^{{\left( {1 - {\text{COH}}} \right)^{{2^{\text{Dr}} }} }} }} - {\text{Dr}}^{{\left( {1 - {\text{COH}}} \right)^{{\frac{{Q_{\text{DCM}} }}{{Q_{b} }}}} }} \left( {1 - {\text{COH}}} \right)^{{{\text{Dr}}^{2.462} }} $$
(6)

This equation has been applied for training and testing data. The computed discharge ratios (Q t /Q b ) resulted from LGP model for both training and testing data are presented in Fig. 6. From this figure, it is clearly seen that this model in all variable ranges of selected data (laboratory and field sections) has very promised accuracy.

Fig. 6
figure 6

Proposed LGP model discharge ratios for training and testing data

3.3 M5 tree equations

By running WEKA program written for M5 tree modeling for training data, four linear models have been obtained based on mainly variations of depth ratio. It is interesting to note that these results are very similar to regions characteristics obtained by Ackers [1].

$$ {\text{LM}}1:\frac{{Q_{t} }}{{Q_{b} }} = 0.0289\frac{{Q_{\text{VDCM}} }}{{Q_{b} }} + 2.147{\text{Dr}} - 0.6225\left( {1 - {\text{COH}}} \right) + 1.1309 $$
(7)
$$ {\text{LM}}2:\frac{{Q_{t} }}{{Q_{b} }} = 0.0096\frac{{Q_{\text{VDCM}} }}{{Q_{b} }} + 2.1164{\text{Dr}} - 23.3021\left( {1 - {\text{COH}}} \right) + 3.2013 $$
(8)
$$ {\text{LM}}3:\frac{{Q_{t} }}{{Q_{b} }} = - 0.07\frac{{Q_{\text{VDCM}} }}{{Q_{b} }} + 4.3191{\text{Dr}} - 1.9831\left( {1 - {\text{COH}}} \right) + 1.3832 $$
(9)
$$ {\text{LM}}4:\frac{{Q_{t} }}{{Q_{b} }} = 0.6592\frac{{Q_{\text{VDCM}} }}{{Q_{b} }} + 2.8378{\text{Dr}} - 2.3681\left( {1 - {\text{COH}}} \right) + 0.4209 $$
(10)

These linear models have been applied to training and testing data. Figure 7 shows these results. As can be seen, both predicted values of training and testing data are well comparable with observed values.

Fig. 7
figure 7

Proposed M5 model discharge ratios for training and testing data

In Fig. 8, the calculated discharge ratios for VDCM are presented in comparison with the LGP and M5 models. As can be seen, the VDCM approach, in all over the data ranges, over-estimates the discharge ratios with large errors. This is due to ignoring the momentum exchange phenomenon developed at the main channel–floodplains interface lines. Based on these predictions, maximum relative errors of discharge ratios for VDCM, LGP, and M5 tree models have been calculated as 355, 80, and 53 %, respectively.

Fig. 8
figure 8

Comparison of proposed LGP model, M5 Tree model and VDCM results of discharge ratios for all data

4 Result statistical analysis

To validate the results for the training and testing sets, several common statistical measures are used, such as R 2 (coefficient of determination), RMSE (root mean square error), AE (average error), and δ (mean absolute deviation):

$$ R^{2} = \left( {\frac{{\sum {xy} }}{{\sqrt {\sum {x^{2} } \sum {y^{2} } } }}} \right)^{2} $$
(11)
$$ {\text{RMSE}} = \sqrt {\frac{{\sum {\left( {X - Y} \right)^{2} } }}{n}} $$
(12)
$$ {\text{AE}} = \frac{{\sum {\frac{X - Y}{X}} \times 100}}{n} $$
(13)
$$ \delta = \frac{{\sum {\left| {X - Y} \right|} }}{\sum X } \times 100 $$
(14)

where \( x = (X - \overline{X} ) \), \( y = (Y - \overline{Y} ) \), X are the observed values, \( \overline{X} \) is the mean of X, Y is the predicted value, \( \overline{Y} \) is the mean of Y, and n is the number of samples. These parameters have been calculated for training, testing, and all data for VDCM and proposed LGP and M5 tree models. The tabular results of statistical parameters are shown in Table 3. Based on these error statistics, it is revealed that the LGP model with R 2 = 0.98 and RMSE = 0.276 for all data has the best accuracy, while the VDCM with R 2 = 0.69 and RMSE = 1.723 has the worst accuracy. M5 tree model has nearly the same accuracy as the LGP model and hence can be applied for river flow discharge prediction in flood events. This comparison clearly shows that VDCM has failed to produce accurate results compared with the proposed LGP and M5 models. The present LGP model with high R 2 = 0.97 has well prediction of flow rate compared with Azamathulla and Zahiri [2] which have very low R 2 = 0.929 for testing data set.

Table 3 Evaluation of models

5 Conclusions

LGP model was developed to estimate the exact values of the relative flow discharge using 394 high-quality data from laboratory and field compound channels. The relative flow rates computed from proposed equation have considerably high accuracy and low error in all over the selected data ranges, while the conventional method of VDCM has very large errors (over-estimate). Also, the present LGP model is more accurate than that proposed by Azamathulla and Zahiri [2]. The present study clearly indicates that the LGP-based formula is more useful for any condition without limitations.