Introduction

Flood risk management has gained significant attention in recent years, and significant effort has been made to improve flood forecasting as well as the estimation methods over time and space (Hsu et al. 2010). The variation of river stage (RS) over time is mainly governed by several factors, i.e., land cover, precipitation, topography, and vegetation, and the interaction between surface water and groundwater has been well documented and highlighted (Marques et al. 2020; Shukla et al. 2021; Vishwakarma et al. 2023c). Accurate estimation of RS based on river models, i.e., channel and hydrological models, helps significantly in achieving and providing estimation of river flood, which can make rapid and quickly estimating records of wetlands zones (Liu et al. 2021), and knowledge of RS is also an important factor in planning and management of watersheds, the development of water resources, and the watershed/aquifer management plans (Strupczewski et al. 2001; Khatibi et al. 2012; Shukla et al. 2021; Vishwakarma et al. 2023c). While the estimation of RS can easily be obtained using hydrodynamic models, their high number of parameters makes them harder to use, and the need for data-driven models presents itself as a possible alternative (Panda et al. 2010). Chau (2006) compared two machine learning models, multilayer perceptron neural network (MLPNN) and the MLPNN optimized using particle swarm optimization training algorithm (MLPNN-PSO), for forecasting RS at one, two and seven days in advance using data collected at the Shing Mun River, Hong Kong. The obtained results showed that the MLPNN-PSO was more accurate than the MLPNN at the three forecasting horizons, with correlation coefficients (R) between 0.92 and 0.99. Chau (2007) applied a split-step particle swarm optimization algorithm (SPS) for forecasting RS at one and two days ahead using data collected at the Shing Mun River, Hong Kong. The SPS was more accurate than the MLPNN, with R values of 0.986 and 0.979 for one and two days ahead, respectively. In order to predict daily RS in the Yangtze River, China, Wu et al. (2008) compared four machines learning models namely: support vector regression (SVR), nearest-neighbor regression (NNR), MLPNN, and linear regression (LR). The best accuracy was obtained using the SVR model (RMSE=0.211m), followed by the MLPNN optimized using genetic algorithm (RMSE= 0.237m), the LR and the NNR with (RMSE=0.237m) and (RMSE=0.242m), respectively.

Adaptive neuro-fuzzy inference system (ANFIS), MLPNN, wavelet packet decomposition combined ANFIS (ANFIS-WPD), and wavelet packet decomposition combined MLPNN (MLPNN-WPD) were the models that Seo and Kim (2016) compared between singles and hybrid data-driven models for forecasting RS. In order to predict the RS at the time (t), the authors used the RS measured at several previous lag times, i.e., from (t − 1) to (− 6). The hybrid models, ANFIS-WPD (R2=0.999) and MLPNN-WPD (R2=0.988), performed better than the single models, ANFIS (R2=0.963) and MLPNN (R2=0.963). Using grey neural network (GNN), Alvisi and Franchini (2012) proposed a new modeling approach for more accurate RS prediction. The GNN was developed using data collected at the Reno River, Italy, and the GNN was compared to the Bayesian neural network (BNN), showing its superiority. In different investigation, Fu et al. (2019) utilized the multiple additive regression trees (MART) for hourly RS forecasting at the Bazhang River basin in southern Taiwan. Three models were compared namely, the MART, real-time MART, and naïve MART models. The real-time MART was more accurate for all three-time steps, t + 1, t + 2, and t + 3, showing low RMSE and High R values.

For forecasting RS at the Schuylkill River at Berne, Philadelphia, USA, Kisi (2011) developed a hybrid model (LR-DWT) using the discrete wavelet transform (DWT) and linear regression (LR). The authors demonstrated that the LR-DWT model outperformed the MLPNN model by highlighting the significant role that the DWT played in enhancing the linear model’s accuracy. Seo et al. (2016a) compared between hybrid wavelet packet decomposition combined SVR (SVR-WPD), ANFIS-WPD, MLPNN-WPD, SVR, MLPNN and ANFIS models in forecasting RS in the Gam Stream watershed, South Korea. Obtained results revealed the superiority of the SVR-WPD which gives R2, RMSE, and MAE of approximately 0.996, 0.0256, and 0.0127, respectively. Seo et al. (2016b) used a variety of methods to optimize the SVR parameters, including genetic algorithm (GA), grid search (GS), artificial bee colony (ABC), and particle swarm optimization (PSO). The SVR-PSO and SVR-ABC were more accurate than the others models showing higher R2 values and lower RMSE and MAE values, respectively, according to the results.

In order to maximize the usefulness of river stage studies, it is necessary to know the already published works in this subject. It is clear from the literature discussed above that machine learning was designed to meet the necessary quality of forecasting. Multiple algorithms have been developed and successfully applied. However, the major findings of the previous studies can be summarized as follow: (i) in the major cases studies discussed above, the proposed machine learning involves the combination of various features for improving the forecasting accuracy, and the comparison was done between standalone machine learning belonging to various categories, i.e., ANN, SVR, Neuroplus, among other, (ii) one major finding is that, the hybridization of machine learning with metaheuristics algorithms was found to be necessary for improving the performances of the developed models, making the comparability of the obtained results difficult, (iii) few investigations have highlighted the importance and the contribution of the preprocessing signal decomposition in improving the estimation of river stage, as signal decomposition help in capturing the high nonlinearity in the dataset. Furthermore, an important finding is that the ensemble methods are rarely reported for river stage forecasting weather with or without signal decomposition.

Therefore, for extensive research on water resources and flood management, accurate estimation of RS is essential for large-scale studies. However, estimation of RS is difficult due to the large number of variables influencing its fluctuation. The above-discussed literature review makes it clear that RS prediction using machine learning was widely discussed in the literature and that many models had been proposed and successfully used. Although hybrid models based on signal decomposition have been suggested for RS forecasting, empirical mode decomposition (EMD) performance studies have not been conducted to date. This paper therefore presents a new modeling strategy for better prediction of RS using machine learning models [i.e., random forest regression (RFR), bootstrap aggregating (Bagging), adaptive boosting (AdaBoost), and artificial neural network (ANN)] with the EMD.

Materials and methods

Study site

River stage data were collected from two USGS stations (Fig. 1): (i) the USGS 14210000 Clackamas River at Estacada, Clackamas County, Oregon (Latitude 45°18′00″, Longitude 122°21′10″ NAD27), and (ii) the USGS 14211499 Kelley Creek at Se 159th drive at Portland, Multnomah County, Oregon (Latitude 45°28′37″, Longitude 122°29′50″ NAD27). Data were collected for the USGS 14210000 station between January 1, 2002, and December 31, 2019, and were recorded on a daily time scale (6574 data). These data were divided into training (70%) and validation (30%), with 4598 data being used for training and 1970 for validation. In a similar manner, data were collected for the USGS 14211499 station between April 9, 2000, and December 31, 2020 on a daily time scale (7572 data), which were spilt into training (70%) and validation (30%). As a result, 5297 data were used for training and 2269 for validation. For daily river stage (RS), we present the mean, maximum, minimum, standard deviation, and coefficient of variation values in Table 1 as follows: Xmean, Xmax, Xmin, Sx, and Cv. For better forecasting of daily rivers stage (RS) using only the RS measured at previous lag time. Consequently, the autocorrelation function (ACF) and partial autocorrelation function (PACF) were used to choose the best relevant time lags (Fig. 2). River stage measurements at times (t − 1), (t − 2), (t − 3), (t − 4), (t − 5), and (t − 6) were chosen and used as input variables according to Fig. 2, while the output variable started at the time (t). The original chosen input variables, i.e., river stage measured at various previous lags, were divided into several intrinsic mode functions (IMF) in the second stage of the investigation using the empirical mode decomposition (EMD) approach (Fig. 3). The IMF obtained using the EMD were used as input variables. We used an eight-level decomposition in the current study, and the RFR, Bagging, AdaBoost and ANN all had forty-eight input variables. Figure 4 shows a flowchart of the suggested modeling approaches used in the current.

Fig. 1
figure 1

Map showing the location the two USGS stations

Table 1 Statistical parameters of the river stage for the two stations
Fig. 2
figure 2

Sample autocorrelation (ACF) and partial autocorrelation function (PACF) for daily river stage (RS)

Fig. 3
figure 3

Intrinsic mode functions (IMF) components of daily river stage (RS) dataset decomposed by the empirical mode decomposition (EMD) algorithm

Fig. 4
figure 4

Flowchart of the EMD & RFR & ANN & AdaBoost & Bagg models

Artificial neural network (ANN)

The structure of the artificial neural network (ANN) used in the present study can be seen in Fig. 5. The network consists of three distinct parts that fit together. First, the input layer that collects the independent variables (i.e., the predictors) for which one neuron was attributed to each input variable. Second, the hidden layers with several processing neurons, and third, the last layer, or the output layer, with only one neuron (Shukla et al. 2021; Elbeltagi et al. 2022). This kind of ANN model is called a multilayer perceptron (MLP). The mathematical operations for each layer can be described briefly as follow: Each neuron in the hidden layer computes a weighted sum of the independent variables available in the input layer (Saroughi et al. 2023); hence, for each one, we have:

$${\delta }_{j}=\sum_{j=1}^{m}({W}_{ij}\times {x}_{i})+{\theta }_{j}$$
(1)

δj is the weighted sum of the hidden neuron j, θj is the bias of the hidden neuron j, Wij is the weight linking the input neuron i to the hidden neuron j, and xi corresponds to one of the inputs variables. The computed δj should be moved through an activation function; in general, the transfer nonlinear activation function is the sigmoid:

Fig. 5
figure 5

Architecture of the ANN model

$${y}_{j}=\frac{1}{1+{e}^{-\delta }}$$
(2)

The obtained yj value for each hidden neuron is then transferred to the output neuron as follow:

$${\gamma }_{k}=\sum_{j=1}^{m}({W}_{jk}\times {y}_{j})+{\vartheta }_{1}$$
(3)

The term γ corresponds to the activation value of the single output neuron; Wjk is the weight linking the hidden neuron j to the output neuron k (k = 1), and \(\vartheta \) is the bias of the output layer. Similar to the hidden neurons, the output neuron uses an activation function for providing the final response, which is the linear activation function. Hornik et al. (1989); Hornik (1991); Simon (1999) all provide additional information on the ANN paradigm.

Random forest regression (RFR)

Random forest regression (RFR), developed by Breiman (2001), is an improved version of the original classification and regression tree (CART), and it is a combination of a set of decision trees (DT) models (Wang et al. 2021; Achite et al. 2023; Kumar et al. 2023), for which each one, i.e., each DT uses only a part of the overall dataset (Sun et al. 2021; Kumar et al. 2023); hence, it takes on board only their subset (Fig. 6). It is important to note that the dependent variables were included in the subset with respect to their equal probability, and the weak tree should be repartee on a different sample subset, while the final response of the RFR model is obtained by majority voting between all single trees (Bhadoria et al. 2021). The overall algorithm of the RFR can be summarized as follows: (i) the overall training dataset is randomly divided into K subsets with replacement using the bootstrap sample method (Xue et al. 2021); (ii) one CART is used for each subset; (iii) approximately two-thirds of the dataset is used for growing each tree and the remaining one-third for calculating the out-of-bag (OBB) error, thus each tree grows as far as it becomes unable to continue the pruning process; (iv) whether used for classification or regression tasks, the final calculated output was provided by aggregation (i.e., regression) or majority voting (i.e., classification) (Xue et al. 2021; Lin et al. 2021).

Fig. 6
figure 6

Structure of the Random Forest regression (RFR) model. The OOB stands for Out-Of-Bag

Bootstrap aggregating (Bagging)

The Bagging algorithm, which is an abbreviation of Bootstrap aggregating, was developed by Breiman (1996). Bagging is an ensemble algorithm based on the idea of majority voting, and it was proposed for improving the performances of weak classifiers through a bootstrap aggregating mechanism (Pham et al. 2017). The Bagging algorithm uses a series of parallel subsets, also called instances, from the original dataset (Fig. 7a), and it allocates a single training algorithm for each subset (Dou et al. 2020). While it was found that the size of each subset is nearly equal to the size of the original overall dataset, the random sampling with replacement helps in avoiding falling into the trap of “duplicates and/or omissions” compared to the original dataset (Hsiao et al. 2020; Gu et al. 2022). Using the same learning algorithm, the final response of the Bagging model should be obtained by aggregating the responses of all subsets with majority voting (Tien Bui et al. 2016). During the last few years, several applications of the Bagging algorithm can be found in the literature, for example, flood probability mapping (Yariyan et al. 2020), prediction of PM2.5 concentration (Qiao et al. 2020), and landslide susceptibility mapping (Hu et al. 2021).

Fig. 7
figure 7

Bagging and boosting architectures: a the Bagging create multiple datasets through random sampling with replacement, and b the Boosting create multiple datasets through random sampling with replacement over weighted data (adopted from Yang et al. (2019)

Adaptive boosting algorithm (Adaboost)

Similar to the Bagging algorithm, the Boosting algorithm belongs to the category of ensemble algorithms. However, the most significant difference between the two is that (Fig. 7b), Boosting generates the weak models sequentially, which are dependent on previous prediction results, while Bagging can generate them in parallel (Zounemat-Kermani et al. 2021). Boosting was first proposed by Bartlett et al. (1998), and later, it was described more in-depth in Schapire (2003). From a computational point of view, the Boosting used the idea of weighting the weak learner’s tacking into account for its contribution to the final prediction during the training phase and also indirectly and proportionally to its calculated error (González et al. 2020). Thus, the boosting algorithm uses a variety of weight for each learner and process by updating and optimizing the poorest calculated errors, hence, the training dataset should be reweighted to gives the poorest learner a new larger weight for the updated training dataset (Kotsiantis 2011).

One of the most well-known ensemble algorithms is the Adaboost (Freund and Schapire 1997), which consists of two distinct components: the forward "step-by-step" algorithm and the "addition" model (Tang et al. 2020). The Adaboost aggregates the outputs of a series of weak learners as follows (i.e., the ‘’addition’’ step):

$$Z\left(x\right)=\sum_{i=1}^{M}{W}_{t}{h}_{t}\left(x\right)$$
(4)

Z(x) denotes a linear combination of the weak learners; ht(x) corresponds to one of weak learners; and Wt denotes the weight attributed to the corresponding weak learner. As it is an iterative algorithm, the weight values are updated iteratively at each step, and during the forward ‘’step-by-step’’, the weak learner obtained during the previous iteration is used for training the learner for the next step and expressed as follows (Kawakita et al. 2005; Tang et al. 2020):

$$Z{\left(x\right)}_{m}=Z{\left(x\right)}_{m-1}+{W}_{t}{h}_{t}\left(x\right)$$
(5)

The Z(x)m−1 represents the linear combination of all weak learners in the previous iteration (Kawakita et al. 2005; Tang et al. 2020). The Adaboost was successfully applied for solving several tasks, among them the prediction of fecal coliforms in rivers (EL Bilali et al. 2021), crude oil price prediction (Busari and Lim 2021), and spatial modeling of snow avalanche susceptibility Spatial modeling of snow avalanche susceptibility using hybrid and ensemble machine learning techniques (Akay 2021).

Assessment of the models' performance

Correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), and mean absolute error (MAE) were used to assess how well the machines learning models for daily River stage (RS) forecasting performed. These expressions are provided (Samantaray et al. 2022; Markuna et al. 2023; Saroughi et al. 2023; Vishwakarma et al.2023a, 2023b, 2023c):

$$ \mathop R\nolimits = {\left[ {\frac{{\frac{1}{\rm N}\sum\limits_{i = 1}^N {\left( {R{S_{obs,i}} - \overline {R{S_{obs}}} } \right) \, \left( {R{S_{est,i}} - \overline {R{S_{est}}} } \right)} }}{{\sqrt {\frac{1}{\rm N}\sum\limits_{i = 1}^n {\mathop {\left( {R{S_{obs,i}} - \overline {R{S_{obs}}} } \right)}\nolimits^2 } } \sqrt {\frac{1}{\rm N}\sum\limits_{i = 1}^n {\mathop {\left( {R{S_{est,i}} - \overline {R{S_{est}}} } \right)}\nolimits^2 } } }}} \right]},\,\,\,( - 1 < R \leqslant + 1) $$
(6)
$$ NSE = 1 - \left[ {\frac{{\sum\limits_{i = 1}^N {{{[R{S_{obs}} - R{S_{est}}]}^2}} }}{{\sum\limits_{i = 1}^N {{{[R{S_{obs,i}} - \overline {R{S_{obs}}} ]}^2}} }}} \right]\,\,,\,\,\,( - \infty < NSE \leqslant 1) $$
(7)
$$ RMSE = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^N {{{[{{(R{S_{obs,i}})}} - {{(R{S_{est,i}})}_i}]}^2}} } ,\,\,\,(0 \leqslant RMSE < + \infty ) $$
(8)
$$ MAE = \frac{1}{N}\sum\limits_{i = 1}^N {|R{S_{obs,i}} - R{S_{est,i}}| \, } ,{ (}0 \leqslant MAE < + \infty ) $$
(9)

\( {\overline {RS}_{obs}}\) and \({\overline {RS}_{est}}\) are the mean measured and mean forecasted daily river stage (RS), respectively, RSobs and RSest specifies the observed and forecasted daily river stage (RS: feet) for ith observations, and N shows the number of data points.

Results and discussion

Four machines leaning, namely the ANN, RFR, AdaBoost, and Bagging, were applied for river stage forecasting. The models were developed without signal preprocessing, i.e., without the EMD algorithm, and in a second stage, the EMD was used for signal decomposition. Based on the ACF and PACF, six input combinations were selected (Table 2), for which the river stage (RS) measured at the previous lag was used as an input variable. Only the results from the validation stage are highlighted and in-depth discussed; the obtained results from the two stations are presented and discussed below.

Table 2 Input combinations for different models

Results at the USGS 14210000

Results obtained at the USGS 14210000 station are reported in Table 3. Scatterplots of forecasted and measured daily river stage for the best models are shown in Fig. 8. In addition, comparison between forecasted and measured daily river stage is depicted in Fig. 9. Table 3 shows that the ANN was slightly more accurate than the RFR, Bagging and AdaBoost models when using standalone models without EMD. This finding is supported by the mean values of the four numerical performances. The mean R, NSE, RMSE and MAE values obtained using the ANN models were ≈0.938, ≈0.879, ≈0.508 and ≈0.219, respectively. These values were marginally better than those obtained using the RFR, which showed improvement rates of approximately ≈0.38%, ≈0.70%, ≈2.72%, and ≈2.52%. The RFR models and bagging models performed similarly, with the mean R, NSE, RMSE and MAE values being 0.934, 0.872, 0.522, and 0.224, respectively. In contrast, the AdaBoost model performed the worst, with mean R, NSE, RMSE, and MAE values of ≈0.931, ≈0.866, ≈0.533, and ≈0.238, respectively. Comparing the models while taking the input combination into consideration, it is evident that increasing the number of input variables, or the number of lag times, significantly improves the model’s performance. From ANN1 to ANN5, the R, NSE, RMSE, and MAE values were improved by 1.20%, 2.20%, 8.48%, and 10.63%, respectively, and the best performances were achieved using ANN5, which is more accurate than all other models. Interesting improvement rates between RFR1 and RFR4 showed values of ≈1.60% and ≈3.00% for R and NSE values, and the RMSE and MAE values were noticeably reduced by values of ≈10.85% and ≈15.20%, respectively. In terms of R, NSE, RMSE, and MAE, the Bagg4 outperforms the Bagg1 by 1.50%, 2.90%, 10.179%, and 14.45%, respectively. Finally, the improvement of the AdaBoost models was less than all other models for which the improvement rates of the R, NSE, RMSE, and MAE did not exceed the values of ≈0.50%, ≈1.00%, ≈4.008%, and ≈8.974%, respectively. In any case, the superiority of one model over another was less sensitive, and none of the proposed models could perform better by adding more input variables.

Table 3 Effectiveness of various forecasting models at the USGS 14210000 station
Fig. 8
figure 8

Scatterplots of measured against forecasted daily river stage (RS) for the validation stage at the USGS 14210000

Fig. 9
figure 9

A comparison of the measured and forecasted daily river stage (RS) for the validation stage at the USGS 14210000

The empirical mode decomposition (EMD) algorithm was used in the second stage of the investigation to decompose signals. Each input variable, or each river stage measured at a previous lag time, was then divided into several intrinsic mode functions (IMFs) and provided to the models as new input variables. Table 3 shows that all models have improved their performances when using the EMD, and it is evident that all models’ mean R, NSE, RMSE, and MAE have improved significantly. The best forecasting accuracies were achieved using the ANN5_EMD model, with improvement rates of approximately ≈2.78%, ≈5.35%, ≈26.40%, and ≈12.43%. The R, NSE, RMSE, and MAE were clearly improved when compared to the ANN5 without EMD, with improvements of ≈3.30%, ≈6.50%, ≈33.47%, and ≈16.66%, respectively. This clearly illustrates the significant role that the EMD played in capturing the high linearity in the river stage dataset, particularly due to its capabilities in reducing the errors metrics, i.e., the RMSE and MAE values. Beyond the ANN models, the improvement in forecasting accuracies of the other models was less sensitive. For example, the mean R, NSE, RMSE, and MAE of the RFR were slightly improved by ≈0.71%, ≈0.98%, ≈3.79%, and ≈1.26%, using the RFR_EMD, respectively. In addition, using the Bagg_EMD, negligible improvement was obtained compared to the Bagg without EMD. The most significant concluding remark was about the AdaBoost_EMD compared to the AdaBoost, for which the improvement rates in terms of R and NSE did not exceed ≈0.2% and ≈0.33%, and the improvement rates of the RMSE and MAE were below the values of ≈1.4% and ≈1.12%, respectively. Overall, the best forecasting accuracy was obtained using the ANN5_EMD, followed by the RFR5_EMD equally with the Bagg5_EMD, and the AdaBoost5 _EMD was the less accurate model. Finally, in Fig. 10, we summarized the best-obtained results in terms of Boxplot, Violin plot, Radar plot, and Taylor diagram.

Fig. 10
figure 10figure 10

Examples of graphs showing models performance for the best developed algorithms during the validation stage at the USGS 14210000: a Boxplot, b Violin plot, c Radar plot, and d Taylor diagram

Results at the USGS 14211499

Table 4 presents the findings from the USGS 14211499. Scatterplots of forecasted and measured daily river stage for the best models are shown in Fig. 11. In addition, comparison between forecasted and measured daily river stage is depicted in Fig. 12. Table 4 shows that the performance of the ANN models was marginally inferior to that of the RFR and Bagg models, with mean R, NSE, RMSE, and MAE values of approximately ≈0.932, ≈0.869, ≈0.182, and ≈0.084, respectively. With mean R, NSE, RMSE, and MAE values of approximately ≈0.935, ≈0.874, ≈0.179, and ≈0.084, respectively, the RFR and Bagg models performed equally well. The AdaBoost models were the less accurate models with mean R, NSE, RMSE, and MAE values of approximately ≈0.914, ≈0.832, ≈0.207, and ≈0.093, respectively. Only the AdaBoost models had numerical performances that were nearly identical, with barely detectable differences. Adding more input variables from one to six do not always result in a decline in the error metrics, i.e., RMSE and MAE, or an increase in the R and NSE values. It is evident that, for the RFR and Bagg models, the sixth input combination, for which the previous six lag times of the river stage were included, produced the best accuracy results (see Table 2). Finally, when using the ANN models, the ANN4 model had the highest R (≈0.935) and NSE (≈0.874) values, the lowest RMSE (≈0.179) and MAE (≈0.083) values, and it was clear that the model’s performances declined after the fourth input combination. Using the EMD for improving forecasting accuracies, the ANN4_EMD was the best model, showing high R and NSE values of ≈0.935 and ≈0.913 and the lowest RMSE (≈0.149) and MAE (≈0.066). The ANN4_EMD improves the performances of the RFR6_EMD by≈1.40%, ≈3.00%, ≈13.372%, and ≈31.25%, respectively, in terms of R, NSE, RMSE, and MAE. The Bagg_EMD models were relatively equal to the RFR_EMD with negligible differences, and the AdaBoost_EMD models were the poorest models, whatever the input combination. In Fig. 13, we summarized the best-obtained results in terms of Boxplot, Violin plot, Radar plot, and Taylor diagram.

Table 4 Effectiveness of various forecasting models at the USGS 14211499 station
Fig. 11
figure 11

Scatterplots of measured against forecasted daily river stage (RS) for the validation stage at the USGS 14211499

Fig. 12
figure 12

Comparison between measured against forecasted daily river stage (RS) for the validation stage at the USGS 14211499

Fig. 13
figure 13figure 13

Examples of graphs showing models performance for the best developed algorithms during the validation stage at the USGS 14211499: a Boxplot, b Violin plot, c Radar plot, and d Taylor diagram

Conclusion

An efficient, hybrid machine learning algorithm is proposed in this study to forecast river stage (RS) with high precision and accuracy. In the algorithm, the impact of preprocessing signal decomposition using empirical mode decomposition (EMD) on RS is investigated. Moreover, as our modeling framework is based on linking the RS at time (t) to the RS at previous lag time, the ACF and the PACF were considered for selecting the relevant input variables, and it was found that the RS measured at (t − 1) to (t − 6) were the most significant input variables. In the second stage of our study, the selected six lag times were decomposed using the empirical mode decomposition (EMD), and the obtained intrinsic mode functions (IMF) were used as new input variables. Consequently, the in situ measured RS was estimated by the new hybrid models, and a comparison among the single models without the EMD was also done. The estimated daily RS was validated against in situ data collected at two USGS stations. The obtained results indicated that the newly proposed algorithm for RS retrieval could effectively retrieve RS with high accuracy and precision. The following conclusions are drawn:

  • Using single models, i.e., ANN, RFR, Bagging, and AdaBoost without decomposition, i.e., without EMD, the four models were relatively equal with slightly superiority in favor to the ANN model, and it was shown that beyond the third input combination, i.e., using only Q(t − 1), Q(t − 2), and Q(− 3) as input variables, improvement in models performances was negligible and marginal.

  • Numerical results revealed that, for all models, the R, NSE, RMSE, and MAE ranged from 0.923 to 0.940, 0.852 to 0.883, 0.498 to 0.562, 0.216 to 0.250 for the USGS 14210000 station, and from 0.913 to 0.940, 0.831 to 0.883, 0.173 to 0.208, and 0.082 to 0.094 for the USGS 14211499 station, respectively.

  • Using the EMD as a preprocessing signal decomposition contributed to a high and significant improvement in models performances, for which we have concluded that; increasing the number of lag times from one to six led to an increase in all numerical values for all criteria, showing the R and NSE values reaching the maximal values of 0.974 and 0.949 at the USGS 14210000, and the maximal values of 0.955 and 0.913 at the USGS 14211499, respectively, and the RMSE and MAE values were drastically decreased to their lowest values of 0.330 and 0.175 at the USGS 14210000, and to the values of 0.149 and 0.076 at the USGS 14211499, respectively.

  • In fact, the obtained results in the present study appeared very consistent and very encouraging regardless of the considered period of record. However, extending the series of records and the forecasting horizon beyond the time (t), i.e., t + 1, t + 2, is more suitable for better conclusions.