1 Introduction

River flow modeling plays a significant role in water resources management including flood control, hydropower generation, watershed management, reservoir operation, and river sediment management (Khatibi et al. 2012). The ability to generate accurate river flow rates based on antecedent (i.e., historical) time series and minimum site information requirement using an artificial intelligence (AI) technique is becoming a popular modeling choice of river system forecasters (Nayak et al. 2004). AI techniques are simple to formulate, yet they are quite robust and can handle complex and nonlinear biophysical processes embedded in hydrological data series. These techniques have a major advantage compared to the parametric modeling techniques since they do not require a priori concept on the relationships between the input variables and output data (Gocić et al. 2015), and they are mathematically less complex to design and implement.

Nowadays, a variety of AI techniques have been implemented in river flow prediction. These models include, among the others, the artificial neural networks (Tawfik et al. 1997; Jain and Chalisgaonkar 2000; Deka and Chandramouli 2003; Sudheer and Jain 2003; Wu et al. 2005; Bhattacharya and Solomatine 2005; Habib and Meselhe 2006; Alvisi et al. 2006; Chen et al. 2006; Clemmens and Wahlin 2006; Kişi 2007; Baiamonte and Ferro 2007; Firat 2008; Khatibi et al. 2011; Aggarwal et al. 2012; Ajmera and Goyal 2012; Hasanpour Kashani et al. 2015; Ghorbani et al. 2016a; Khatibi et al. 2017; Zhang et al. 2018; Kashani and Soltangheys 2018), fuzzy logic (Alvisi et al. 2006), M5 model trees (Bhattacharya and Solomatine 2005; Ajmera and Goyal 2012), adaptive neuro-fuzzy inference system (ANFIS) (Khatibi et al. 2011; Hasanpour Kashani et al. 2015; Zhang et al. 2018; Kashani and Soltangheys 2018), support vector machines (Sivapragasam and Muttil 2005; Aggarwal et al. 2012; Goel and Pal 2012; Ghorbani et al. 2016b; Zhang et al. 2018), genetic programming (Ghimire and Reddy 2010; Khatibi et al. 2011; Liu and Chung 2014; Hasanpour Kashani et al. 2015; Zhang et al. 2018), chaos theory (Khatibi et al. 2012), hybrid wavelet gene expression programming (WGEP) (Shoaib et al. 2015), wavelet artificial neural networks (WANN), hybrid ANN-PSO and ANN-GA (Zhang et al. 2018), hybrid SVM-QPSO (Ch et al. 2013), hybrid NNRF-PSO (Taormina and Chau 2014), hybrid ELM-BSO (Taormina et al. 2015), hybrid ANN-IIS-W (Prasad et al. 2017), and hybrid MLP-FFA (Khatibi et al. 2017). These research works have revealed an acceptable level of accuracy for various water management and water flow prediction tasks that were attained with many categories of AI techniques. In addition to the above-mentioned AI techniques, some new hybrid intelligent methods have been developed (Deng et al. 2017a, b; Zhao et al. 2017; Deng et al. 2018, 2019a, b) and can be applied for accurate modeling of water management issues.

In this paper, the authors utilize a suite of less-explored AI techniques, known as the cascade correlation neural network (CCNN) and random forest (RF) models. The ability of the CCNN model has led to some investigators applying the method for river flow prediction. For example, the study of Karunanithi et al. (1994) has applied the CCNN model for the prediction of the Huron River flow at the Dexter sampling station, and the study of Diamantopoulou et al. (2007) has developed the CCNN model with Kalman learning rule to forecast the 1-day ahead daily flow at Ilarionas station on the Aliakmon River, Northern Greece. In another study, Alok et al. (2013) used two neural networks models, known as Elman neural networks (ENN) and CCNN, to estimate the data of the Indian River Brahmani, revealing the good ability of the CCNN model to evaluate the river flow time series.

To benchmark the CCNN model, the random forest (RF) model, nonparametric regression and rule-based algorithm, is also used in this study. The RF model entails the predictions that are created by averaging the predicted values from multiple regression (or decision) trees, which are trained on separate bootstrapped resamples of the training dataset. Zhao et al. (2012) introduced the RF model for selecting predictor dataset from measured streamflow and a total of 74 hydro-climatic indices to predict seasonal streamflow, while the study of Nguyen et al. (2015) has investigated the application of the RF model, least absolute shrinkage and selection operator (LASSO) and support vector regression (SVR) for forecasting of daily water levels at Thakhek station on the Mekong River. Shortridge et al. (2016) used the ANN, RF, and other models to simulate monthly streamflow of five highly seasonal rivers in the highlands of Ethiopia.

Although there have been different research works to predict water stage and river flow using ANN-based models, the applications of the CCNN and RF models, as attempted in this paper, have been relatively limited compared to the conventional ANN-based models. In view of this deficit from current knowledge, the novelty of present study is to design for the first time and to investigate the capability of CCNN and RF models for river flow prediction and also to broaden their usage in hydrologic time series predictions where modeling data for two major river systems in Australia (i.e., Dulhunty and Herbert Rivers) are employed.

2 Methodology

2.1 Cascade correlation neural networks (CCNN)

The CCNN model, developed by Fahlman and Lebiere (1990), is a special case of the artificial neural networks (ANN) model where a parallel information processing system is used, consisting of a set of neurons arranged in input, hidden, and output layers. These neurons provide suitable conversion functions for weighted inputs. The CCNN model, as opposed to classical ANN models, can automatically synthesize the best neural networks architecture as part of the model’s training process. The CCNN model is an efficient constructive neural networks model combining the idea of incremental structure and learning algorithm during its training. Training starts with minimal networks consisting of an input and an output layer without a hidden layer. If the training can no longer reduce the residual error, then this phase of training is stopped and enters the next phase for the training of the potential hidden node. The potential hidden node has associated connection weights from the input layer and all preexisting hidden nodes but not toward the output layer (Karunanithi et al. 1994; Diamantopoulou et al. 2007; Kim et al. 2014; Zounemat-Kermani et al. 2019).

In a CCNN model, the connection weights associated with the potential hidden nodes are optimized by the gradient ascent method to maximize the correlation between its output and the residual error of the CCNN model. When a potential hidden node is trained, connection weights associated with the output layer are kept unchanged. When a potential hidden node is added to the CCNN’s structure, it becomes a new hidden node, and its incoming connection weights are fixed for the remainder of training. After installing a hidden node successfully, the training updates all of the connection weights, which directly feed the output layer. The CCNN model automatically constructs a suitable structure for a given problem (Karunanithi et al. 1994; Thirumalaiah and Deo 1998). Figure 1a shows the structure of the CCNN model.

Fig. 1
figure 1

Structure of the models: a CCNN model (Kim et al. 2014) and b RF model (Rodriguez-Galiano and Atkinson 2016)

2.2 Random forest (RF)

The RF model consists of an ensemble of randomized regression trees that operate by constructing a multitude of regression trees and then aggregating them to yield a single prediction (Fig. 1b). The RF model provides reliable error estimated by using the out-of-bag (OOB) data and estimates covariate importance by changing the order of arrangement/arranging the values of each covariate in the OOB sample and predicting OOB samples using the permuted variable (Zhao et al. 2012). The change in OOB error is then an indication of the importance of that covariate in the dataset. The RF model depends only on two user-defined parameters, the mtry (the number of ancillary data in each random subset) and ntree (the number of trees in the forest), which are optimized by iterating mtry values from 1 to 10 (the total number of covariates) and ntree values from 100 to 10,000 by increments of 100 (Hengl et al. 2015; Were et al. 2015).

2.3 Study area, data available, and performance criteria

The daily mean river stage and river flow time series data of Dulhunty and Herbert Rivers in Australia were used in this study (Table 1). The data are arbitrarily divided into two parts for training and testing. The training datasets were chosen at 80% of the length of the time series (from 2014/05/22 up to 2016/10/14), and the testing datasets covered their remaining 20% (from 2016/10/15 up to 2017/05/21). Figure 2a, b shows a plot of the observed daily river stage and discharge, respectively, with training and testing period. The statistical parameters of river stage and discharge data are given in Table 2. In the table, the Xmean, Sx, Cv, Csx, Xmax, and Xmin denote the mean, standard deviation, variation coefficient, skewness, maximum, and minimum, respectively.

Table 1 Physiographical characteristics of river stage and discharge measurement stations
Fig. 2
figure 2

Time series plot for the observed stage and discharge data with a period of 2014/05/22 to 2017/05/21): a Dulhunty River and b Herbert River

Table 2 Statistical characteristics of the river daily stage and discharge data sets

To validate the performance of the models, diagnostic plots and statistical score metrics were employed in the testing phase. These metrics are described as:

  1. I.

    Root mean square error (RMSE) (Willmott and Matsuura 2005) is expressed as:

    $$ {\text{RMSE}} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {(P_{i} - O_{i} )^{2} } } $$
    (1)
  2. II.

    Nash–Sutcliffe coefficient (ENS) (Nash and Sutcliffe 1970) is expressed as:

    $$ E_{\text{NS}} = 1 - \left[ {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {O_{i} - P_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{N} {\left( {O_{i} - \mathop {O_{i} }\limits^{\_\_} } \right)^{2} } }}} \right],\quad - \infty < E_{\text{NS}} \le 1 $$
    (2)
  3. III.

    Willmott’s index of agreement (WI) (Willmott et al. 2012) is expressed as:

    $$ {\text{WI}} = 1 - \left[ {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {O_{i} - P_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{N} {\left( {|P_{i} - \mathop {O_{i} }\limits^{\_\_} | + |O_{i} - \mathop {O_{i} }\limits^{\_\_} |} \right)^{2} } }}} \right],\quad 0 < {\text{WI}} \le 1 $$
    (3)
  4. IV.

    Legate and McCabe’s index (ELM) (Legates and Davis 1997; Legates and McCabe 1999, 2013) is expressed as

    $$ E_{\text{LM}} = 1 - \left[ {\frac{{\sum\nolimits_{i = 1}^{N} {|O_{i} - P_{i} |} }}{{\sum\nolimits_{i = 1}^{N} {|O_{i} - \mathop {O_{i} }\limits^{\_\_} |} }}} \right],\quad 0 < E_{\text{LM}} \le 1 $$
    (4)

    where Oi and Pi are the observed and predicted ith value of the Q, and \( \bar{O} \) is the average of observed Q value.

3 Results and discussion

This study has used daily river stage and river flow data from the Dulhunty and Herbert Rivers in Australia. As stated previously, the evaluation and comparison for the performance of CCNN and RF models are the core point of the paper based on the prediction of daily river flow.

3.1 Development of CCNN model

Various combinations of river stage and river flow variables for the Dulhunty and Herbert Rivers were applied as the input variables for CCNN and RF models to determine the best input variables. Therefore, the CCNN and RF models are a priori fed with water stage (H). It was adopted as the minimum number of input combinations represented by the CCNN 1 and RF 1 for choosing the optimal input combinations. Various input combinations of river stage and river flow variables to estimate the river flow are shown in Table 3.

Table 3 Different input combinations of the models

A difficult task with ANN modeling is to choose its optimal architecture and determine the number of hidden layers and nodes using a trial and error method. The network geometry and architecture rely on the addressed problem (Kişi 2007; Kim et al. 2012; Seo et al. 2015). This study started using one hidden layer for the construction of the CCNN model since one hidden layer can be enough to represent the specific nonlinear relationships (Kumar et al. 2002). The number of hidden nodes was determined using a trial and error method for the CCNN model with the different input combinations based on the statistical criteria.

Table 4 shows a summary of the statistical indices of each CCNN model during the training and test phases. It can be given from Table 4 that the CCNN 6 model, whose input variables are flow discharges at times t and t − 1 (Qt, Qt−1), produced the most accurate results among the other input combinations for the Dulhunty River. Moreover, the CCNN 10 model, whose input variables are river stage and river flow at times t until t − 3 (Ht, Ht−1, Ht−2, Ht−3, Qt, Qt−1, Qt−2, Qt−3), provided the most accurate results among the other input combinations for the Herbert River. Here, the optimum structure of the CCNN 6 (2, 3, 1) denotes a CCNN model comprising two input, three hidden, and one output nodes, respectively. Also, the optimum structure of the CCNN 10 (8, 1, 1) denotes a CCNN model comprising eight input, one hidden, and one output nodes, respectively. Figure 3a, b shows observed and predicted river flow values and their corresponding scatter plots during the test phase for the CCNN 6 and CCNN 10 models for the Dulhunty and Herbert Rivers, respectively. Figure 3 shows that although there can be found little errors in peak flows prediction, the CCNN 6 and CCNN 10 models can estimate nonlinear river stage and river flow values efficiently. This is in agreement with the previous reports provided by Alok et al. (2013).

Table 4 Result of the CCNN model for different input combinations of the Dulhunty and Herbert Rivers
Fig. 3
figure 3

Comparative plots of the observed and predicted flow of the best CCNN models and their corresponding scatter plots during the testing phase: a Dulhunty River and b Herbert River

3.2 Development of RF model

The performance of the RF model for different input combinations is presented in Table 5 based on the statistical measures. In the RF technique, two parameters (i.e., no. of trees and leaf size) need optimizing at first using a trial and error method. Table 5 presents the optimal parameters for the RF model. The results indicated that the best performance of the RF model could be achieved with the low no. of trees (49) and leaf size of 5 (RF 9, whose input variables are Ht, Ht−1, Qt, Qt−1, Qt−2, Qt−3) and higher no. of trees (51) and leaf size of 5 (RF 6, whose input variables are Qt, Qt−1) for the Dulhunty and Herbert Rivers, based on the statistical criteria.

Table 5 Result of the RF model for different input combinations of the Dulhunty and Herbert Rivers

Figure 4a, b shows the observed and predicted river flow values and their corresponding scatter plots during the test phase for the RF 9 and RF 6 models in the Dulhunty and Herbert Rivers, respectively. It can be found in Fig. 4 that the RF 9 and RF 6 models can provide the nonlinear river stage and river flow values successfully. This result is in agreement with the former paper obtained by Zhao et al. (2012). Like the CCNN model, it provided some errors for predicting the peak flows. This is also following the outputs obtained by Shortridge et al. (2016).

Fig. 4
figure 4

Comparative plots of the observed and predicted flow of the best RF models and their corresponding scatter plots during the testing phase: a Dulhunty River and b Herbert River

3.3 Comparisons and discussions of results

In this chapter, the performances of two applied models (i.e., CCNN and RF) were compared based on different figures. Figure 5a, b provides Taylor diagrams for the results generated by the CCNN and RF models, respectively. Here, the Taylor diagrams represent a simple graphical comparison that the similarity between the observed and predicted river flow values in terms of the correlation coefficient and standard deviations (Taylor 2001) has been used to investigate the efficiency of applied models visualized using the points as a polar plot. The ratio of variance can be calculated to produce the relative depths of predicted and observed variations in the testing phase (Taylor 2001; Gleckler et al. 2008). In this study, the Taylor plot has been used to outline the proposed models (i.e., CCNN and RF) to represent the degree prediction accuracy where the distance from the observed point can be measured from the centered RMSE (Taylor 2001). It can be seen from these visualizations that the model denoted as CCNN 10 and RF 6 generated the results that were the closest to the observed point compared to other models for the Herbert River. However, for the Dulhunty River, it is difficult to judge which one of these models is superior in their performance.

Fig. 5
figure 5

Taylor diagrams of RF and CCNN models for predicting the daily river flow over the training and testing phases: a Dulhunty River and b Herbert River

Figure 6 shows the performance evaluation criteria for the two models. Given the obtained results in Fig. 6, it indicated that all models have efficient performances in flow river prediction. As a comparison result of RMSE, NSE, WI, and ELM values for predicting daily river flow, it can be concluded that the CCNN model provided an accurate performance compared with the RF model for both the Dulhunty and Herbert Rivers. A direct comparison of CCNN and RF models can be also illustrated in Fig. 6 in terms of residual plots and histograms for the two rivers during the testing phase. The residual plots showed that the residuals (error) of the CCNN model were less than those of the RF model. In determining the best input combination, all models produced almost the different results for both the Dulhunty and Herbert Rivers. This indicates that the model architecture can be considered as an important factor to determine the most effective input variables

Fig. 6
figure 6

Result of optimal RF and CCNN models for predicting the daily river flow in Dulhunty and Herbert Rivers over the testing phase

It can be seen from the results that the CCNN and RF models could estimate the river flow within acceptable ranges. Since the application of CCNN and RF models could not be found in hydrologic time series modeling fields (e.g., streamflow, sediment, rainfall, evaporation, and groundwater), they can provide the diverse development and application processes based on the different data groups. Besides, two models (i.e., CCNN and RF) among the different ANNs-based models are a minimum level to select the optimal input combination in general. The clear selection for optimal input combinations based on the specific environments (i.e., river, watershed, lake, and reservoir) can depend on the number of developed and applied models. Therefore, the continuous researches are required to select and specify the optimal input combination using the different ANNs-based models.

4 Conclusion

In this study, the effectiveness of CCNN and RF models is investigated for the river flow prediction. To achieve this goal, the river stage and river flow data for two gauging stations in Australia for three years (2014/05/22-2017/05/21) are used. For training and testing the models, the time series data for each station are divided into 80% and 20%, respectively. Several statistical indicators such as RMSE, NSE, WI, and ELM are used to compare the applied models’ performances. Following the archived outcomes, the performance indicators reveal that the CCNN model is able to provide the accurate and reliable predictions in comparison with the RF model. The advantages of the RF model are: (i) there is no need for feature normalization, (ii) individual decision trees can be trained in parallel, (iii) the RF model is widely used, (iv) this method reduces overfitting, and (v) it accommodates the nonlinear relationship between input and output. Also, the RF model has some disadvantages such as (i) it is not easily interpretable and (ii) it is not a state-of-the-art algorithm.

The advantages of CCNN model are that since there is no need for a user to worry about the topology of the network, the CCNN model learns much faster than the usual learning algorithms, and training is quite robust. Also, the disadvantages of CCNN model are categorized as an extreme potential for overfitting the training data and less accurate than probabilistic neural networks on small-to-medium-size problems. To improve the current models and make them particularly useful for operational river system forecasting, further research may be warranted, using the different hydro-climatic datasets to explore the models’ ability to predict river stage and river flow variables.