1 Introduction

Weirs are among the most important hydraulic structures. They pass the floods from dams’ reservoirs, divert water from canals and could be used as discharge-measurement devices in the channels. The safety of channels which transfer water and the security of dams are very closely related to the sufficiency of weir’s capacity. Most of the damages made to the water-transporting channels and even dam floodways are due to the weirs which don’t have enough capacity [1, 2]. When the flow level increases behind the weirs and reaches a level higher than the weir’s crest, the flow passes over it. The velocity profile is curved and nonlinear when the flow passes over the weir and if discharge decreases in the channel, the flow over the weir decreases too [3]. As a result, a strong, safe and highly efficient weir structure must be selected with regard to the sensitivity of its function. This structure has to be ready for exploitation at any time. Selecting the discharge coefficient (C d) of the weirs is generally one of the most important matters, which has an important role in decreasing structural and financial damages caused by floods. Proper understanding of the function of weirs can significantly decrease their construction expenses.

The most important types of weirs include sharp-crested, broad-crested, ogee, labyrinth, shafts, side weirs, stepped weirs and siphon weirs [46]. In the last decade, different researchers have used new methods, called soft computing or intelligence methods, which are desirably efficient and accurate, to solve complicated problems related to discharge capacity of the weirs, based on the hydraulic parameters of flow and geometry of the weirs. The key hydraulic parameters in this respect are Froude number upstream of the weir, flow depth, crest height and weir height. Emiroglu et al. [7], Subramanya and Awasthy [8], Swamee et al. [9], Bagheri and Heidarpour [10], Kisi [11], Bonakdari et al. [12], Emiroglu et al. [13] and Kisi et al. [14] are among the many researchers who calculated C d of labyrinth weirs by using soft computing methods.

Huang et al. [15] introduced extreme learning machine (ELM) algorithm for single-layer feed-forward artificial neural network (ANN). This algorithm’s problem-solving capability is due to use of an algorithm built on gradient descent like the back propagation, which applies in ANN. ELM is highly trained to reduce the time needed for training a neural network. Researchers have observed that by utilizing ELM, the learning process is significantly faster and produces reliable generalization performance [16]. Several researchers [1722] have used ELM to solve data problems in different scientific fields.

The aim of the present research is precise determination of C d in a triangular labyrinth weir through three methods of ANN, ELM and genetic programming (GP). Afterward, the resulting C ds will be plotted and compared with some experimental results, which were found in the literature. Finally, some well-known statistical criteria are used to select the best estimation method.

2 Materials and methods

In this study, three models (ELM, ANN and GP) were designed to estimate C d of a triangular labyrinth weir. Some brief explanations of these three models are given here.

2.1 Artificial neural network

As a notion, based on human brain’s function, an artificial neural network (ANN) is commonly employed to solve complicated problems in a wide range of sciences. In general, an ANN consists of some linked nodes (so-called neurons) and three kinds of constructed layers including an input layer, some hidden layers and an output layer. Each layer is made up of a number of neurons. The number of hidden layers does not follow a fixed rule. However, when the number of neurons in hidden layers is extremely high, an unacceptable long time is taken by the network to train for each value [2325]. For this reason, by taking into account various numbers of neurons within the hidden layers, different models have been developed through trial and error. The model with the best results will be chosen as the ultimate ANN model. MATLAB software was used to run the ANN model for this study.

2.2 Genetic programming (GP)

GP can be seen as an evolutionary technique, and it is a very challenging task to devise a theory for it. GP was not commonly used as a search technique in the 1990s. Later on, GP led to the evolution of computer programs and commonly has been demonstrated in the memory. Hence, it is represented as a tree structure (Fig. 1). We are able to easily assess and estimate trees in a recursive form. As we know that every node of a tree has an operator function mode and each terminal node contained with an operand which comforts evolution as well as evaluation of mathematical statements [26, 27]. Therefore, GP traditionally prefers making use of programming languages which naturally represent tree structure. Non-tree representations have also been proposed and prosperously executed, like linear GP, which is consistent with more traditional imperative languages [28, 29]. The majority of non-tree representations possess structurally ineffective codes (interns). Such noncoding genes may appear without use since they do not exert any influence on the performance of any individual. However, research studies have shown faster convergence with program representations (like linear GP and Cartesian GP) which render such noncoding genes possible, compared to tree-based program representations which do not possess any noncoding genes. The two primary operators applied in evolutionary algorithms are crossover and mutation. In a crossover operator, which is applied in an individual, one of its nodes is simply substituted with another node that has been chosen out of another individual in the population. In the case of a tree-based representation, switching a node implies that the whole branch is replaced, and therefore, this causes higher performance and improves capability of the crossover operator [28, 29]. Expressions’ appearance, which has been retrieved from crossover, is entirely different from their initial parents. However, mutation, which can affect an individual in the population, will be able to replace a whole node in the chosen individual. Moreover, mutation is exclusively able to replace the node’s information just to keep and retain integrity so that make operations to be fail-safe or furthermore kind of information which is kept by node has to be taken into account. For example, mutation has to know and distinguish binary operation nodes [28, 29]. Otherwise, the operator must be able to make enough efforts to deal with missing values.

Fig. 1
figure 1

An example of GP expression tree

2.3 Extreme learning machine (ELM)

ELM is one of the neural network models which has stepped into the spotlight in the recent years. This model has been utilized in a wide range of applications in the preceding decade, owing to its simplicity [30]. Huang et al. [15] proposed an ELM algorithm tool to train single-layer feed-forward neural network (SLFN) architecture. ELM determines random input weight and analytically defines the SLFN’s output weight. Since ELM benefits from a faster learning computational analysis and a greater generalized capability, it does not need much intervention during analysis, and therefore, it runs faster than common algorithms. ELM is also able to determine the entire network parameters and hereby minimizes trivial intervention. It can be seen as an effective algorithm with numerous merits such as ease of use, quick learning speed, higher performance and adaptability for varying nonlinear activation and kernel functions. ELM is designed in a way that L hidden neurons constitute SLFN [15]. It is composed of L distinct samples of zero error. Hidden nodes are assigned random values. On the other hand, output weights are computed through pseudo-inverse of H, with minimum error, even in cases which the number of exclusive samples (N) is larger than the number of neurons (L). Random values can be conveyed to the hidden node parameters of ELM ai and bi. These two above-mentioned parameters should not be tuned within the training stage.

Theorem 1

According to Liang et al. [22], if we assign a certain SLFN which represent infinitely differentiable in any interval of R, so it has an activation function g(x) as well as RBF hidden nodes or L additive nodes; as a result, two exclusive input vectors are produced for any continuous probability distribution. Henceforth, it shows that {x i |x i  ∊ R ni = 1, …, L}and{(a i b i )} L i=1 are the two input vectors, respectively, so that is the hidden layer’s output matrix which is invertible for probability one; its output matrix H is invertible; then, we have ‖ − T‖ = 0 [15, 19, 22].

Theorem 2

Pursuant to Liang et al. [22], the condition that “any small positive value ε > 0” is an assumption we make and an activation function g(x): R → R which serves infinitely differentiable in any interval, then this condition is true L ≤ N such that for N arbitrary distinct input vectors {x i |x i  ∊ R ni = 1, …, L} for any {(a i b i )} L i=1 randomly generated based on any continuous probability distribution ‖H N×L β L×m  − T N×m ‖ < ɛ with probability one [19, 22].

Equation (1) is a kind of a linear system; based on aforementioned items, the ELM’s hidden node parameters should not be tuned during the training and these parameters can easily be assigned random values. Output weights could be assessed as follows:

$$\beta = H^{ + } T$$
(1)

where H + represents the Moore–Penrose generalized inverse of the hidden layer output matrix H. Various approaches such as orthogonal projection, iterative and singular value decomposition (SVD) can be utilized for its computation [19]. Only when H + is non-singular and H + = (H T T)−1 H T, then we can employ the orthogonal projection method. Since searching and iterations are used, orthogonalization and iterative methods have restrictions. ELM implementation utilizes SVD to calculate Moore–Penrose generalized inverse of H for it can be employed under any conditions. Therefore, we can conclude that ELM serves as a batch learning method [19].

2.4 Experimental model

The experimental model of Kumar et al. [4] has been used in this research to predict the discharge coefficient of the weirs. The experimental model was a 12-m-long rectangular channel with a width of 0.28 m and a height of 0.41 m. A triangular weir has been used in this experiment (Fig. 2). The weir is placed at 11 m away from the channel entrance. Point gages, with ±0.1 m measurement precision, are used above the weir to measure the water height. A number of pores are installed in the channel wall and in the weir in order to create a nape flow. Grid walls and preventive flows were installed in the upstream of the channel in order to prevent and reduce the formation of vortexes and water-surface disturbance.

Fig. 2
figure 2

Plan of the experimental channel used in Kumar et al. [4]

The hydraulic parameters of Kumar et al. [4] experiment are listed in Table 1. Table 2 shows the range of the parameters which were used in this study. Number of input data in training mode of the three models was 86 and in the test mode was 37.

Table 1 Hydraulic parameters used to estimate C d in this study
Table 2 Parameters used to estimate average discharge coefficient (Kumar et al. [4])

2.5 Statistical indices

In order to verify the accuracy of the estimated C ds by ANN, ELM and GP models, different statistical criteria including coefficient of determination (R 2), root-mean-square error (RMSE), mean absolute percentage error (MAPE), SI and δ are used, as defined in the following equations:

$$R^{2} = \left[ {{{\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x} } \right)\left( {y_{i} - \overline{y} } \right)} } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x} } \right)\left( {y_{i} - \overline{y} } \right)} } {\sqrt {\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x} } \right)^{2} } \sum\limits_{i = 1}^{n} {\left( {y_{i} - \overline{y} } \right)^{2} } } }}} \right. \kern-0pt} {\sqrt {\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x} } \right)^{2} } \sum\limits_{i = 1}^{n} {\left( {y_{i} - \overline{y} } \right)^{2} } } }}} \right]^{2}$$
(1)
$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {x_{i} - y_{i} } \right)^{2} } }$$
(2)
$${\text{MAPE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\frac{{\left| {x_{i} - y_{i} } \right|}}{{x_{i} }}}$$
(3)
$${\text{SI}} = \frac{\text{RMSE}}{{\overline{x} }}$$
(4)
$$\delta \;\% = \frac{{\sum\nolimits_{i = 1}^{N} {\left| {(y_{i} - x_{i} )} \right|} }}{{\sum\nolimits_{i = 1}^{N} {y_{i} } }} \times 100$$
(5)

where y i and x i are predicted (by models) and observed (experimental) C d values, respectively, and \(\overline{y}\) and \(\overline{x}\) are average predicted and observed C d values, respectively.

3 Results and discussions

Figure 3 presents plots of the estimated values of discharge coefficient (C d) by ELM, GP and ANN models versus calculated experimental values. As shown in this figure, the estimated results confirm fairly well with the experimental values almost in all the three models. It seems from this figure that the ELM model’s estimated C d values are much closer to the experimental C d values than the ANN and GP models.

Fig. 3
figure 3

Comparison of estimated C ds with experimental results in training and test modes

Figure 4 indicates that in the training mode, more than 90, 70 and 65 % of the C d data are estimated with a relative error smaller than 1.5 % in the ELM, ANN and GP models, respectively. In the ELM model, almost 100 % of the C d data are estimated with an error of <2.5 %. This situation happens in 5.5 % error for ANN and GP models.

Fig. 4
figure 4

Error distribution for the three models (training mode)

In the test mode, the ELM, ANN and GP models’ predictions are much closer to each other and are much better than their predictions in the training mode. But again, it could be said that in general, the ELM model acts better than the ANN and GP models (Fig. 5).

Fig. 5
figure 5

Error distribution for the three models (test mode)

Tables 3 and 4 are presented to investigate the accuracy of estimating the C d values through using different statistical indices using the training data and test data, respectively. It can be observed that the values of R 2, RMSE, MAPE, SI and δ are accurate for the three ELM, ANN and GP models. In addition, considering these tables, it could be observed that average relative error is almost 1 % for all three models. For ELM model, R 2 is 0.993 and 0.971 in the training and test modes, respectively, and therefore, this model is the best for prediction of C d values. The ANN and GP models are ranked next to ELM model. Taking a look at Tables 3 and 4 reveals that the MAPE, RMSE, SI and δ indices for ELM model are also much smaller than the values for ANN and GP models.

Table 3 Statistical indices for the three models (training mode)
Table 4 Statistical indices for the three models (test mode)

Table 5 shows the predicted C d values by ELM model and their comparison with the experimental ones. It could be seen in the table that the predicted values do not follow a specific trend; sometimes, it overpredicts and sometimes it under-predicts the C d values. The point to be noted, however, is that this model predicts relatively well under different hydraulic conditions in such a manner that maximum relative error by this model is approximately 2.27 %.

Table 5 Comparison of the C d values as predicted by using the ELM model and calculated in the experiment

4 Conclusions

Weirs are one of the methods for controlling floods in dam reservoir and diverting and measuring the flow in channels. Discharge capacity over the weir crest is an important hydraulic parameter in this respect. In this research, discharge coefficient of weir has been predicted using three intelligent models of extreme learning machine (ELM), genetic programming (GP) and artificial neural network (ANN). To that end (L/h) (L/w) (h/b) (sin θ), (sin θ) × w/L, and (y/(sin θ) × w), dimensionless parameters were used to train and test the designed models. Results of the ELM, ANN and GP models were compared with some experimental results. Five statistical indices of R 2, RMSE, MAPE, SI and δ were used to compare the predicted and experimental C d values. The examinations indicated that with an R 2 of 0.993 in the training mode, an R 2 of 0.971 in the test model, and minimum MAPE value of 0.81 % in training mode and 0.89 % in the test mode, the ELM model presents the best results in comparison with the rest of the models. The ANN model also presented relatively good results, similar to those of the ELM model.