1 Introduction

In recent years much attention has been paid to the environment, especially river and lake pollution. Rivers and streams are usually receiving the outlet of sewage systems which may cause pollutant levels to rise (Haghiabi 2016). Pollutant dispersion is a key element in water quality modeling (Antonopoulos et al. 2015) and the longitudinal dispersion coefficient (LDC) is an important factor in stream pollution modeling due to its effect on pollutant mixing intensity. The pollution is affected by advective and dispersive processes, and is dispersed longitudinally, transversely and vertically (Seo and Cheong 1998). Most of the experimental studies on dispersion coefficient in streams are based on routing tracer concentration along the river (Atkinson and Davis 2000; Davis et al. 2000; Velísková et al. 2014; Disley et al. 2015; Parsaie and Haghiabi 2017). Estimation of the dispersion coefficient directly, using tracers in rivers, is very difficult, expensive and time-consuming. Mixing is a three-dimensional process near the pollution point source. After total mixing far from the injection point, only longitudinal dispersion is used to describe the dispersion phenomenon (Chatila 1997; Velísková et al. 2014; Haghiabi 2017). Therefore, to predict longitudinal dispersion, researchers have developed different equations based on experimental and field measurements. These equations use hydraulic and geometric parameters such as width of the channel, mean flow velocity, shear velocity and depth of flow (Alizadeh et al. 2017). Several researchers have used as input parameters the ratio of channel width to flow depth (W/H) and mean velocity to shear velocity (U/U) to estimate LDC based on their correlation with dimensionless dispersion coefficient (Kx ∕ HU) (Noori et al. 2017). Table 1 presents some empirical equations for LDC estimation. These equations have been derived using different methods and in some cases different set of data, and result in various performances based on stream conditions. In this regard, investigators have recently used data-driven models to estimate LDC. Some of data-driven models which include support vector machine, M5 tree algorithm, differential evolution, genetic algorithms and genetic programming, have been used by Azamathulla and Wu (2011), Etemad-Shahidi and Taghipour (2012), Li et al. (2013), Sahay and Dutta (2009) and Sattar and Gharabaghi (2015), respectively. Due to the use of LDC in devising water diversion strategies, designing treatment plants, intakes and outfalls, and studying the environment (Ho et al. 2002), an important step is the validation of LDC estimation models under different condition. It should be noted that, almost all empirical and data-driven models predict longitudinal dispersion coefficient with simplifying assumptions, which could affect the accuracy of the model results (Sahin 2014). For accurate estimation of water quality parameters, uncertainty and sensitivity analysis must be performed along with water quality modeling (Nakhaei and Etemad-Shahidi 2012). Quantification of the error in water quality models could be used as a first step in evaluation of risk assessment in water resources management and planning. Uncertainty of model-input, model-structure, model-parameter and measurement could be classified as different sources of uncertainty in water quality management (Radwan et al. 2004). Monte Carlo simulation in hydrologic models is widely used in uncertainty analysis (Mishra 2009). This method enables hydrologic modelers to study the effect of input parameter sets on uncertainty of water quality parameters (Pasha and Lansey 2010). Along with uncertainty analysis, sensitivity analysis assists modelers to evaluate output range based on range of input parameters, which could be used to determine the most effective parameters (Nakhaei and Etemad-Shahidi 2012). Non-dimensional sensitivity coefficient is used by hydrological and environmental scientists, to analyze multivariable models (McCuen 1974; Saxton 1975; Rana and Katerji 1998; Hupet and Vanclooster 2001; Gong et al. 2006).

Table 1 Empirical and data-driven models for estimating LDC

This study presents the performance of empirical equations and data-driven models using statistical analysis. The novelty of this study lies in sensitivity analysis of accurate empirical and data-driven models, to determine the effect of each parameter on LDC estimation. Sensitivity indices showed which parameter in each model had the main role on LDC estimation. Also, uncertainty analysis was performed based on non-repeated, random data series produced by Monte Carlo simulation to investigate the behavior of empirical and data-driven models. This can determine the performance of each model considering uncertainty in input parameters.

2 Theory and Previous Studies

Transverse shear velocity and transverse mixing become in equilibrium after a certain timescale (Taylor 1954). The simplified 1-D advection-dispersion equation for steady flow conditions has the following form (Etemad-Shahidi and Taghipour 2012):

$$ \frac{\partial C}{\partial t}+U\frac{\partial C}{\partial x}={K}_x\frac{\partial^2C}{\partial {x}^2} $$
(1)

where C is the cross-sectional average concentration (kg/m3), U is the mean flow velocity (m/s), x is the direction of the mean flow (m), t is the time (s), and Kx is the longitudinal dispersion coefficient (m2/s) in the flow direction. Based on Rutherford (1994), important features of tracer profiles in laboratory and river channels can be illustrated using Eq. (1).

The LDC in streams is affected by a range of parameters. The most important parameters are density, viscosity, channel width, flow depth, mean flow velocity, shear velocity, bed slope, bed roughness, horizontal stream curvature (i.e., sinuosity), and bed shape factor (Seo and Cheong 1998; Guymer 1998; Etemad-Shahidi and Taghipour 2012). Due to the complexity of measuring these parameters, the researchers have generally applied the hydraulic and geometric parameters such as channel width, flow depth H, flow velocity U and shear velocity U which have important effects on LDC. Based on the equilibrium between longitudinal shear velocity and vertical turbulent diffusion, Elder (1959) used Taylor’s results from pipes to open channels and derived the following equation to estimate the LDC (Deng et al. 2001):

$$ {K}_x=5.93H{U}_{\ast } $$
(2)

where H and U represent the flow depth (m) and shear velocity (m/s), respectively.

Fischer (1967) suggested that the transverse profile of the velocity is more important than the vertical profile for dispersion in natural streams and developed the following integral relation for the dispersion coefficient in natural streams having large width to depth ratios (Sahay 2011):

$$ {K}_x=-\frac{1}{A}{\int}_0^W{Hu}^{\prime }{\int}_0^y\frac{1}{\varepsilon_tH}{\int}_0^y{Hu}^{\prime }\ dydydy $$
(3)

where A is cross-sectional area (m2); y is the coordinate in the lateral direction (m); u is the deviation of local depth velocity from the cross-sectional mean velocity (m/s); W is channel width (m); and εt is the transverse, turbulent diffusion coefficient (m2/s). Due to the complexity of Eq. (3), Fischer (1967) developed the following simple and practical equation:

$$ {K}_x=0.011\left(\frac{U^2{W}^2}{H{U}_{\ast }}\right) $$
(4)

Seo and Cheong (1998) proposed an empirical expression based on one-step method developed by Huber (1981), which is a robust regression method, gives reasonably good estimation even in the presence of moderately bad leverage points. Seo and Cheong (1998) used 59 sets of data from 26 U.S. streams to develop the following equation and showed its superiority over existing expressions:

$$ {K}_x=5.915{\left(\frac{W}{H}\right)}^{0.62}{\left(\frac{U}{U_{\ast }}\right)}^{1.428}\left(H{U}_{\ast}\right) $$
(5)

Deng et al. (2001) developed an analytical method based on Fischer’s triple integral expression for estimation of LDC in rivers. They assumed that uniform-flow formula is valid for local depth-averaged parameters. Their equation is theoretically-based and clarifies the dispersion mechanism. Based on Deng et al. (2001), the velocity is the most sensitive parameter among all input parameters in Eq. (6); a change of 10% in this parameter causes significant variation in the LDC:

$$ {\displaystyle \begin{array}{l}{K}_x=\left(\frac{0.15}{8\ {\varepsilon}_{t_0}}\right){\left(\frac{W}{H}\right)}^{5/3}{\left(\frac{U}{U_{\ast }}\right)}^2\left(H{U}_{\ast}\right)\\ {}{\varepsilon}_{t_0}=0.145+\frac{1}{3520}\left(\frac{U}{U_{\ast }}\right){\left(\frac{W}{H}\right)}^{1.38}\end{array}} $$
(6)

Kashefipour and Falconer (2002) established an equation for predicting the LDC in natural channels using 81 sets of field data in the U.S., by relating this process through dimensional and regression analysis to the main hydraulic parameters such as river depth, width, velocity and shear velocity. Kashefipour and Falconer (2002) applied multiple regression between parameter combinations, and a best fit simple equation was derived, as follows:

$$ {K}_x=10.612\ (HU)\left(\frac{U}{U_{\ast }}\right) $$
(7)

Kashefipour and Falconer (2002) used a linear combination of Eq. (7) and Seo and Cheong’s (1998) formulation to develop Eq. (8), which led to a further improved equation for predicting the LDC in streams:

$$ {K}_x=\left[7.428+1.775{\left(\frac{W}{H}\right)}^{0.62}{\left(\frac{U_{\ast }}{U}\right)}^{0.572}\right](HU)\left(\frac{U}{U_{\ast }}\right) $$
(8)

Zeng and Huai (2014) showed that the product of water depth and cross-sectional mean flow velocity has a higher linear correlation with the LDC than the product of water depth and shear velocity. Therefore, with combination of the product of H and U and other two non-dimensional parameters, a new equation for longitudinal dispersion coefficient was proposed:

$$ {K}_x=5.4\ {\left(\frac{W}{H}\right)}^{0.7}{\left(\frac{U}{U_{\ast }}\right)}^{0.13} HU $$
(9)

Sahin (2014) proposed an equation based on dimensional and least squares analysis, using 128 field data sets measured in 41 rivers in the U.S. as follows:

$$ {K}_x=48\ {\left(\frac{U}{U_{\ast }}\right)}^{0.47}{R}_hU $$
(10)

where Rh is the hydraulic radius (m), which was calculated assuming a rectangular channel section due to the lack of data on cross section shape (Sahin 2014). Disley et al. (2015) developed an equation to estimate LDC using combined data sets from five steeper head – water streams and 24 milder and larger rivers. This equation relates the LDC to hydraulic and geometric parameters of the stream and has been developed using multiple regression analysis:

$$ {K}_x=3.563\ {\left(\frac{U}{gH}\right)}^{-0.4117}{\left(\frac{W}{H}\right)}^{0.6776}{\left(\frac{U}{U_{\ast }}\right)}^{1.0132}H{U}_{\ast } $$
(11)

where g is a gravitational acceleration (m/s2).

Data-driven models have widely been used by researchers to estimate LDC in streams. Sahay and Dutta (2009) developed an equation to estimate the LDC, using the datasets of Deng et al. (2001) and genetic algorithm:

$$ {K}_x=2{\left(\frac{W}{H}\right)}^{0.96}{\left(\frac{U}{U_{\ast }}\right)}^{1.25}\left(H{U}_{\ast}\right) $$
(12)

Etemad-Shahidi and Taghipour (2012) derived two interpretable equations to estimate LDC using M5 tree algorithm and 149 datasets from rivers around the world:

$$ If\left(\frac{W}{H}\right)\le 30.6,{K}_x=15.49{\left(\frac{W}{H}\right)}^{0.78}{\left(\frac{U}{U_{\ast }}\right)}^{0.11}\left(H{U}_{\ast}\right) $$
(13-a)
$$ If\left(\frac{W}{H}\right)>30.6,{K}_x=14.12{\left(\frac{W}{H}\right)}^{0.61}{\left(\frac{U}{U_{\ast }}\right)}^{0.85}\left(H{U}_{\ast}\right) $$
(13-b)

Table 1 summarizes selected equations and models for LDC estimation.

3 Materials and Methods

3.1 Data and Statistical Analysis

A collection of distinctive datasets measured in different streams were used in this study (Fischer 1968; Yotsukura et al. 1970; Godfrey and Frederick 1970; McQuivey and Keefer 1974; Nordin and Sabol 1974; Rutherford 1994; Graf 1995; Seo and Cheong 1998; Disley et al. 2015). The datasets contained geometric and hydraulic characteristics, which include: channel width, flow depth, mean flow velocity, shear velocity and longitudinal dispersion coefficient (Appendix Table 8). Histograms of W, H, U, U, Kx, W/H, and U/U are illustrated in Fig. 1. The histogram of W/H implies that the studied cases varied from narrow rivers (W/H < 10) to very wide rivers (W/H > 100). The friction term in the form of U/U (Seo and Cheong 1998) can be considered as the hydrodynamic characteristic of the river bed (Etemad-Shahidi and Taghipour 2012). The statistical values of parameters are presented in Table 2.

Fig. 1
figure 1

Histograms of all parameters

Table 2 Statistics of parameters used in this study

3.2 Sensitivity Analysis

Geometric and hydraulic characteristics such as channel width, flow depth, mean flow velocity and shear velocity may have some uncertainties in their value estimation. Poor estimation procedures, tracer loss, or measurements made in the advective zone are examples of such uncertainties in Kx values (Etemad-Shahidi and Taghipour 2012).

Sensitivity analysis was employed in order to identify which parameters have more influence on the dimensionless longitudinal dispersion coefficient. Model sensitivity is the rate of change in one factor as output with respect to change in another factor as input while the other parameters are kept constant (McCuen 1973), or how the variation in the output of a model (numerical or other) can be apportioned, qualitatively or quantitatively, to different sources of variation of input parameters (Saltelli et al. 2004).

A logical step in model development is the determination of the most important parameters affecting the model results. A ‘sensitivity analysis’ of these parameters could serve to help future studies (Hamby 1994). Computer models used in hydraulic engineering have been increased, and this has not been accompanied by a corresponding increase in sophistication of sensitivity analysis (Hall et al. 2009). Estimation of the risk by the coupling of hydrodynamic, structural reliability and impacts models causes additional motivation for improved sensitivity analysis (Dawson et al. 2005). However, without a systematic method to exploring the model response to inputs changes, model developers cannot discover reliable intuitions about the model behavior and interactions (Hall et al. 2009). Sensitivity analyses have been used to determine which parameter has the most effect on reducing output uncertainty, and/or which parameters are negligible and can be eliminated from the final model, and/or which inputs contribute most to output change, and/or which parameters are strongly correlated with the output, and/or what are the consequent results from changing each input parameter (Hamby 1994).

Input parameters for sensitivity analysis of LDC models were considered with their average value; one parameter was changed in a defined domain and this process continued for all of the remaining parameters. With this mechanism, the output variability was estimated based on insignificant modifications of each input parameter, and the model sensitivity to each parameter variation was predicted. A general LDC model can be defined as follows:

$$ {K}_x=f\left({V}_1,{V}_2,\dots {V}_n\right) $$
(14)

where Vi represents input parameters. Based on Beven (1979), the variation of Kx can be written as:

$$ {K}_x+\Delta {K}_x=f\left({V}_1+\Delta {V}_1,{V}_2+\Delta {V}_2,\dots, {V}_n+\Delta {V}_n\right) $$
(15)

Expanding Eq. (15) in Taylor series, and ignoring second-order terms, leads to:

$$ \Delta {K}_x=\frac{\partial {K}_x}{\partial {V}_1}\Delta {V}_1+\frac{\partial {K}_x}{\partial {V}_2}\Delta {V}_2+\dots +\frac{\partial {K}_x}{\partial {V}_n}\Delta {V}_n $$
(16)

where the differentials \( \frac{\partial {K}_x}{\partial {V}_i} \) define the sensitivity of the estimated output to each model parameter. Let us set:

$$ {A}_S=\frac{\partial {K}_x}{\partial {V}_i}\approx \frac{\Delta {K}_x}{\Delta {V}_i} $$
(17)

where As represents the absolute sensitivity of the output estimation to each input parameter. The differential analysis is typically much more demanding to implement than other sensitivity methods and yet provides only comparable results. Using sensitivity analysis as a partial derivative form is impractical due to its complexity (Gardner et al. 1981). In addition, when parameter variability takes realistic values this method which is valid for only small changes in parameter values will be impractical (Hamby 1994).

The magnitude of parameters in the LDC equation varied, therefore, the absolute form of sensitivity values from Eq. (17) are unsuitable for comparison of sensitivity values. So, relative sensitivity values were used to compare sensitivity values of input parameters (Mount et al. 2013) in the form:

$$ {R}_s=\frac{\Delta {K}_x}{\Delta {V}_i}\frac{V_i}{K_x} $$
(18)

Relative changes or errors can be defined as in Saxton (1975):

$$ {R}_E=\frac{\Delta {K}_x}{K_x} $$
(19)

where Rs is a dimensionless coefficient which demonstrates the percentage of the relative parameter change transmitted to the relative dependent parameter. This may be defined as the sensitivity coefficient, for example, a sensitivity coefficient of 0.2 means 10% change in Vi as an input parameter, would cause a 2% change in LDC (ΔKx/Kx) (Saxton 1975).

LDC models are affected by four input parameters which have a wide variation range in nature and a lot of real data are needed to investigate the performance of LDC estimation. A sensitivity and error analysis of the empirical and data-driven models are conducted for mean values of input and output parameters and on the assumption that the interaction between input parameters is negligible (Deng et al. 2001). Performance of selected models were evaluated by two approaches of changing each input individually and the whole ones by random and none-repeating dataset. In addition, global sensitivity based on Saltelli et al. (2008) has been performed to investigate the interaction between input parameters. Therefore, first-order sensitivity index (Si) has been estimated for each input parameter. If the sum of all Si was equal to 1, model is additive and there is not any interaction between input parameters (Saltelli et al. 2008).

In this study, for each input parameter, 100 random and none-repeating datasets were produced in a domain of ±10% and ± 20% change of each input parameter for each available data series. To investigate the effect of all input parameters on LDC estimation, another random data series were produced based on ±10% and ± 20% change on all parameters. These datasets have been used for selected models and the minimum and maximum LDC estimation for each data series of every dataset was derived to analyze the performance of models and derive the uncertainty curves. It is necessary to mention that for sensitivity and uncertainty analysis of each input parameter, the other input parameters were kept constant at their average values.

3.3 Model Validation

Performance of LDC models have been evaluated using statistical measures, including the mean absolute error (MAE), the root mean square error (RMSE), and the discrepancy ratio (DR) and the related accuracy. DR was defined by White et al. (1973) to evaluate the difference between measured and predicted values. If DR = 0, the predicted and measured values of the dispersion coefficient are identical, while the model overestimates the measured values of the dispersion coefficient when DR > 0, and underestimates them when DR < 0. Accuracy is defined as the proportion of numbers with DR between −0.3 and 0.3 in the total number of data (Seo and Cheong 1998):

$$ \mathrm{MAE}=\frac{1}{N}\sum \left|D{R}_i\right| $$
(20)
$$ RMSE=\sqrt{\frac{1}{N}\sum {\left(D{R}_i\right)}^2} $$
(21)
$$ DR=\mathit{\log}\frac{K_{x_p}}{K_{x_m}} $$
(22)

where \( {K}_{x_p} \) and \( {K}_{x_m} \) are the predicted and measured LDC, respectively.

4 Results and Discussion

Sensitivity analysis, in addition to statistical analysis, helps the researchers to know limitations and advantages of LDC models. Statistical measures, including MAE, RMSE and DR of empirical and data-driven models are given in Table 3. Histogram of DR values for better comparison between models are also illustrated in Fig. 2.

Table 3 Comparison of the performance of the various models
Fig. 2
figure 2

Comparison of the DR values of different models

Table 3 results shows that Elder (1959) equation has the maximum error and minimum accuracy. This equation is suitable for streams with no transverse shear, but the accuracy of this equation illustrates the importance of transverse variation (Etemad-Shahidi and Taghipour 2012). DR < −0.3 for this model is about 98% and this demonstrates lower estimation of the LDC by Elder equation. McQuivey and Keefer (1974) model with RMSE equal to 2.04 and accuracy of 9.15% generally overestimates the LDC values in streams with 89% of DR > 0.3. Error criteria for Fischer (1967) decreased in comparison with Elder (1959) and McQuivey and Keefer (1974) and its accuracy has been improved. Sahin (2014) has the highest accuracy among all empirical models, followed by Zeng and Huai (2014), Liu (1977) and Kashefipour and Falconer (2002). Disley et al. (2015), Seo and Cheong (1998) and Deng et al. (2001) models with the accuracy of 48.17%, 46.34 and 45.12%, respectively, have relatively accurate estimation of LDC. Zeng and Huai (2014) has the lowest RMSE among all empirical formulas. Error estimation of LDC for the Kashefipour and Falconer (2002) model is more than the corresponding values for some of the empirical models but its perfect symmetry between lower and upper estimates make this model suitable for LDC estimation (Etemad-Shahidi and Taghipour 2012). Liu (1977), Seo and Cheong (1998) and Deng et al. (2001) overpredict the LDC by 2.15, 2.14 and 1.57 times, respectively, more than the underpredicted cases. However, for Kashefipour and Falconer (2002), the overpredicted and underpredicted cases are equal, which make the performance of this formula to be better on LDC estimation. This result is consistent with Etemad-Shahidi and Taghipour (2012) findings.

Genetic algorithm has the lowest accuracy among data-driven models, therefore, this model was eliminated from sensitivity analysis. Based on Table 3, M5 model has the highest accuracy and lowest RMSE among all LDC estimation models. Finally, based on statistical analysis three empirical equations, including Kashefipour and Falconer (2002), Sahin (2014) and Zeng and Huai (2014), and three data-driven models, including M5, GE and DE have been selected for sensitivity analysis.

As it was mentioned above, model sensitivity is the rate of change in LDC with respect to change in input parameters while the other parameters are kept constant; in other words, to investigate the direct effect of one parameter on LDC, the effect of other parameters should be neglected by keeping their values constant. The average of input parameters for LDC estimation are calculated from the existing datasets, and are presented in Table 4.

Table 4 Average value of input parameters used for LDC estimation

Results of global sensitivity based on Saltelli et al. (2008) are presented in Table 5. In this table, first-order sensitivity index (Si) has been estimated for all selected models. Sum of Si for LDC models has been estimated near 1, which implies that these models are additive with weak interaction between input parameters.

Table 5 First-order sensitivity index (Si) of input parameters for selected models

In this study, ΔVi = 0.1Vi has been used for estimation of relative sensitivity coefficient and relative error. Sensitivity analysis of selected empirical and data-driven models are presented in Table 6. Also, the rate of LDC changes based on ±10% and ± 20% change of each input parameter with assuming the other parameters to be constant are illustrated in Fig. 3. For M5 models, which contain two equations, the dataset was divided into two domains and used for each equation based on model criteria. Rs is one of the most important sensitivity indicators for multivariable models. Parameters with the large amount of Rs have the great effect on LDC. The estimated error caused by changing each parameter on LDC is shown by RE (Table 6).

Table 6 Sensitivity analysis of selected empirical and data driven models
Fig. 3
figure 3

Sensitivity analysis of (a) Kashefipour and Falconer (2002); (b) Sahin (2014); (c) Zeng and Huai (2014); (d) M5 Eq. (13-a); (e) M5 Eq. (13-b); (f) DE model; (g) GE model

Mean flow velocity has the maximum of Rs in Kashefipour and Falconer (2002), and when increasing 10% the velocity, the LDC value increases about 18%, as it is illustrated in Table 6. Shear velocity has an inverse effect on LDC, where increasing 10% the shear velocity leads to about 6.7% decrease in LDC. The effect of input parameters changes on LDC computed by Kashefipour and Falconer (2002) equation is presented in Fig. 3a. According to the contents of Table 6, mean flow velocity and channel width have the highest and lowest effect on LDC, respectively, according to Sahin (2014) model. Fig. 3b shows that the channel width has no influence on LDC using this model. Mean flow velocity, channel width and flow depth have the highest Rs based on Zeng and Huai (2014) model, respectively, (Table 6; Fig. 3c). This model has a lowest sensitivity to shear velocity among all empirical and data driven models.

The M5 algorithm proposed two piecewise equations for LDC estimation. For this reason, the sensitivity analysis was performed for two nonlinear equations of this model. Splitting value for W/H is approximately 30, close to the value obtained by Papadimitrakis and Orphanos (2004). In narrow rivers with W/H ≤ 30.6, the importance of shear velocity and channel width are more than the flow depth and velocity, therefore W/H is more important than U/U on LDC estimation. In wider rivers, where W/H > 30.6, mean flow velocity has the highest value of Rs, hence U/U has the main effect on LCD (Table 6). A possible interpretation is that Kx may be less influenced by the W/H ratio in very wide rivers than in narrow rivers. This result is in agreement with Papadimitrakis and Orphanos (2004) and Etemad-Shahidi and Taghipour (2012) findings. In addition, Fig. 3d, e and g for M5 and GE models, show that the shear velocity has a direct effect on LDC which is not consistent with the empirical equations. The DE model developed by Li et al. (2013) behaves similarly to the empirical models, with the mean flow velocity having a highest effect and the shear velocity an inverse effect on LDC estimation (Table 6; Fig. 3f). It was also found that in GE model, the most effective parameters on LDC were the flow depth, the mean flow velocity, the channel width and the shear velocity, respectively, in descending order of importance. The shear velocity in GE model has an uncommon effect on LDC, as shown in Fig. 3g, where an increase and decrease of this parameter has the same impact on LDC. It should be noted that the models developed by GE have a complicated structure which reduce their interpretation in comparison with other models.

Table 7 and Fig. 4 illustrate Rs values of the selected empirical and data-driven models for better comparison. Mean flow velocity is the most sensitive parameter on empirical equations and some of data-driven models (M5-Eq. 13-b, and DE), which is consistent with the findings of Deng et al. (2001) and Haghiabi (2017). Therefore, the performance of the formulas mentioned above, depends heavily on the velocity value. Comparison of the two equations developed by M5 model shows that the role of mean flow velocity becomes more pronounced in relatively wide rivers than narrow rivers, which is in agreement with Rutherford (1994) findings. In addition, shear velocity is the most sensitive parameter among the four input parameters in narrow streams. This has been reported by Tayfur and Singh (2005), who stated that ANN model can yield satisfactory predictions of the LDC in narrow streams, if the shear velocity was used as the only input parameter.

Table 7 Relative sensitivity (Rs) of all models
Fig. 4
figure 4

Histogram of relative sensitivity of all models

Table 7 shows that the channel width is the most sensitive parameter in M5 Eq. (13-a) model, while Kashefipour and Falconer (2002) has the least sensitivity to channel width. Flow depth has the most impact on Sahin (2014) formula, while M5-Eq. (13-a) has the least sensitivity to flow depth among all models of LDC estimation. Velocity has the highest effect on Kashefipour and Falconer (2002) and this parameter is the least sensitive parameter in M5-Eq. (13-a). M5-Eq. (13-a) and Zeng and Huai (2014) have the most and the least sensitivity to shear velocity among others.

Uncertainty curves based on Monte Carlo simulation, which have been used to produce new data set based on ±10% and ± 20% changes of each input parameter for the empirical and data-driven models are presented in Figs. 5, 6, 7, 8, 9 and 10. For interpretation of the figures, all data were arranged in descending order based on LDC values.

Fig. 5
figure 5

Performance of Kashefipour and Falconer (2002) using Monte Carlo simulation

Fig. 6
figure 6

Performance of Sahin (2014) using Monte Carlo simulation

Fig. 7
figure 7

Performance of Zeng and Huai (2014) using Monte Carlo simulation

Fig. 8
figure 8

Performance of M5 using Monte Carlo simulation

Fig. 9
figure 9

Performance of DE using Monte Carlo simulation

Fig. 10
figure 10

Performance of GE using Monte Carlo simulation

According to the Monte Carlo simulation (Figs. 5, 6, 7, 8, 9 and 10), Zeng and Huai (2014) equation had the lowest sensitivity to parameter changes, and showed a smooth curve. Uncertainty of this equation in comparison with other models, for all input parameters, was negligible. Two other empirical models, those by Kashefipour and Falconer (2002) and Sahin (2014), showed a high uncertainty for higher LDC values. LDC estimated for ±20% changes in all input parameters, for high LDC values, were more than the LDC calculated for the original data values at about 80 and 60% for Kashefipour and Falconer (2002) and Sahin (2014), respectively. This was also observed for DE and GE models. The M5 models are very sensitive to parameter changes, based on Fig. 8. The W/H ratio would change with variation in flow depth and channel width, and in some cases, the stream is converted from narrow to wide and vice versa. In such cases, the other equation of the M5 model would be used for the new dataset, which makes a high differences between LDC estimated for the original dataset and the new data generated by the Monte Carlo simulation for the same dataset (sudden jump). The GE model has a non-smooth uncertainty curve for high LDC values due to the model’s complexity.

The results of the uncertainty analysis based on Monte Carlo simulation showed that the Zeng and Huai (2014) equation demonstrates less uncertainty to input parameter changes and is more reliable for estimation of LDC in comparison with the other models.

5 Conclusions

Due to the complexity of measuring and the time-consuming tracer studies, empirical and data-driven models have been developed by many scientists to estimate the longitudinal dispersion coefficient, in order to apply in mathematical models for water quality modelling. Based on statistical analysis, M5 algorithm has the best accuracy and the least computational error compared to other studied empirical and data-driven models. Sensitivity analysis on selected empirical and data-driven models showed that the Kashefipour and Falconer (2002) equation has the least sensitivity to channel width, which could be used for rivers with variety in width or uncertainty in measuring this parameter. In this case, uncertainty in this parameter would have the least effect on LDC estimation. M5-Eq. (13-a), DE and Zeng and Huai (2014) may be used for conditions with fluctuations in flow depth. Also, M5 has the least sensitivity to velocity for narrow rivers which makes this model suitable for narrow streams with fluctuation in depth and velocity, such as meandering rivers. The results of Monte Carlo simulation showed that uncertainty of Kashefipour and Falconer (2002), Sahin (2014), DE and GE models are very high for high LDC values. The M5 model results have some sudden jumps for W/H ratios around 30.6. It seems through this threshold (W/H=30.6), that the M5 model is forced to exploit two equations (Eqs. 13) for the original and the new dataset, making significant difference between the LDC estimated for the two datasets. Some jumps occurred in GE model for high LDC values due to the complexity of the equation of GE model. These jumps could have negative effect on performance of LDC estimation. However, this amount of measured data is a relatively small dataset to properly describe nonlinear processes. The effect of key factors will be better captured using empirical models by continuing to measure longitudinal dispersion in a wide range of streams (Gharabaghi and Sattar 2017).