1 Introduction

It is scientifically proved that stream-flow is characterized by high nonlinearity distribution and dynamic pattern [16]. Over the past several decades, stream-flow forecasting has been an important and challenging issue [79]. In practical management, stream-flow forecast is tremendously significant for water resources planning and operation. Real-time forecasting that can be addressed as short-term stream-flow can yield an important and reliable operation for flood control and mitigation protection, whereas long-term forecasting is essential for several water resources applications involving river sediment operation, reservoir and water demand sustainability, hydro power production, and several others uses [10, 11].

Since the early of 1970, the classical approaches based on mathematical and statistical models had been undertaken to solve this issue; for instance, multiple linear regression (MLR) model and autoregressive integrated moving average (ARIMA) [1216]. The main drawback of the classical models is that they are limited with linear regression solution that is not really applicable in capturing the highly stochasticity of stream-flow pattern. Recently, a noticeable use of artificial intelligence (AI) techniques to model and forecast river flow time series, such as classifiers and machine learning approaches [2, 1728], fuzzy logic system [4, 2933], evolutionary computation [3438], wavelet complementary models [3944]. Despite the flexibility and utility of AI techniques in modeling stream-flow, they still have some drawbacks and limitations (e.g., over-fitting, slow learning speed, local minima, and difficulty to capture the high complexity, non-stationary, dynamism and nonlinearity of time series). Those mathematical models are much advanced over the physical models that are usually needed much efforts and information in order to module the hydrological variables of a specific watershed area [4547].

In the last decade, a non-tuned machine learning entitled extreme learning machine algorithm is gained a noticeable efficiency in modeling the regression problem [4852]. This approach is firstly proposed by [53] and has been broadly utilized in many applications due to its capabilities (i.e., randomization of the internal network weights and less time consuming during the learning processes), for example, evapotranspiration prediction [54], fast object recognition and image classification [55, 56], landslide displacement prediction [57], sales prediction of fashion retailing [58], melting points prediction of organic compounds [59], big data classification [60], and use of priori knowledge [48]. Relatedly with the investigated application, monthly stream-flow prediction model conducted by integrating ELM model with wavelet decomposition approach for a case study in southwestern China region [61]. In 2016, Deo and Sahin [62] implemented ELM model for stream-flow forecasting in Queensland to validate its superiority over artificial neural network (ANN) models. Another version of ELM model which is based on online sequential methodology was investigated for short-term river flow forecasting for Canadian region [63]. In semi-arid region, the application of ELM model showed an optimistic satisfactory for modeling long-term stream-flow forecasting [64]. Generally, the ELM model as non-tuned approach showed an outstanding level of accuracies vis-à-vis state-of-the-art AI models.

By comparing with AI techniques, ELM method offers a highly modeling capabilities such as randomly assigned internal weights, much fast learning processes and very simple architecture of neural network. Moreover, it overcomes the shortcomings of the traditional popular gradient-based learning algorithms.

Since there are numerous neural networks architecture, researchers are still facing questions such as which neural network is precisely best fit or should be utilized for specific problem. Unfortunately, there is no general neural network satisfactory answering these questions because we are still in the stage of exploring methods that are capable for modeling hydrological processes. However, in this research, an investigation of the proficiency of ELM method in forecasting one step ahead of hydrological data representing in time series fashion with acceptable level of accuracy overcome the highlighted drawbacks of the existing AI techniques (e.g., time consuming of the learning processes and trivial human intervention). Daily, average weekly and average monthly river stream-flow data are employed to examine the proposed ELM method as alternative model for stream-flow forecasting in tropical environment, Johor River, Malaysia. Johor River is one of the essential rivers in Peninsular Malaysia that provides Johor state with water supply in addition to various domestic and agriculture usages. Studying the flow pattern of this river flow is extremely significant for its sustainability. In order to examine the proposed ELM method, a comparative analysis between the performance of the classical ANN method and the proposed ELM has been carried out. The following sections of the manuscript are established in the following manner. Description of the ANN method and its algorithm expressions have been presented. Comprehensive details for the proposed ELM method and approach are reported in Sect. 3. The case study and the data preparation with the model structure are presented in Sect. 4. Section 5 addresses the results achieved by the ELM method and introduced a details discussion on the comparison analysis with the classical ANN method. Finally, the conclusion of the current research has been highlighted in Sect. 6.

2 Artificial neural network (ANN) approach

ANN is an advance mathematical model that deals with problem similar to the human brain attitudes. Theoretically, ANN development based on several components was reported by Task and Neural (2000), which are: (i) the ANN structure has a single elements called nodes, the information processing occurs in each nodes, (ii) these connection links to transfer signals between the each of nodes, (iii) for each connection link has an associated weight that represents its connection strength, and (iv) the activation function is founded in each node to determine its output signal. In general, ANN architecture is designed in three main elements: input layer has one or more number of input nodes based on the number of parameters for the model; single hidden layer including the activation function; and output layer nodes.

Several algorithms have been used in the learning processes of ANN in the field of hydrological applications. However, according to the existing literature, radial basis function neural network (RBFNN) method superior to the other training methods. This is because RBFNN works with higher reliability, faster convergence and smaller extrapolation [6567]. RBFNN model was proposed by Lowe and Broomhead [52]. Mainly, the RBFNN is structured with three different layers, input, hidden and output layers as demonstrated in Fig. 1. Each layer has its own function in order to implement the proposed assignment, and in this research, it is forecasting stream-flow. The first layer is designed to transfer the input variables into the RBFNN process. The second layer is introduced to adapt the nonlinear transformation function connections between the input variables and neurons (radial basis function nodes in the hidden layer). Finally, the linear transformation has been implemented in order to transfer the hidden layer space information to the output layer which is considered as the desired variable.

Fig. 1
figure 1

Radial basis function algorithm structure in the artificial neural network

The RBF functions φ 1, φ 2,… φ N are known as basic hidden transfer functions, while \(\{ \varphi i(x)\} \;i = 1N\) is termed as the intermediate hidden domain. There is one constraint of such architecture that the number of RBF functions that forward the input variables from the input layer the hidden layer (N) is less than the number of the available data records that presenting the input–output pattern. The most popular RBF functions that usually used in such pattern recognition application is the Gaussian function; the following formula shows the Gaussian representation as one-dimensional domain:

$$\varphi ( x , \mu ) = e^{{\frac{{\left\| {x - \mu } \right\| ^{2} }}{{2d^{2} }}}}$$
(1)

where μ indicates the center of the Gaussian function which presents the mean value of x and d indicates the distance from the center of φ(x, μ).

There are two different key parameters, namely the center and spread d. These two parameters are initiated at the commencement of the model process and then adjusted during training process. Generally, the hidden unit is more sensitive to data points near the center and this is according to the Gaussian radial function. Such sensitivity could be controlled and adjusted by changing the initial values of the spread d. An example of the Gaussian radial basis is demonstrated in Fig. 2. It is obvious from Fig. 2 that the radial basis function is less sensitive to the input data pattern when the spread value is relatively large.

Fig. 2
figure 2

RBFNN with different levels of spread. a Normal spread. b Small spread. c Large spread

3 Non-tuned machine learning: ELM Approach

Extreme learning machine (ELM) was first proposed by [53]. The first proposed was with single hidden layer feedforward neural networks. After that, it was developed to the generalized SLFNs, the hidden layer after developed became need not be neuron alike [68, 69]. The main feature of the extreme learning machine method has strong potential as applicable alternative methods due that the hidden layer does not need to be tuned. In addition, it considers the minimum norm of the output layer weights and requires fewer parameter sittings, the training processes are extremely fast compared with the gradient descent learning algorithms, and it indicates a good generalization performance [52].

This article investigates the capability of the ELM to forecast stream-flow using different time interval time series (daily, average weekly and average monthly). Different input combinations are supplied to the SLFNs-ELM, involving the antecedent records to forecast one step ahead. Mathematically, the random generation of the input weights consisting different lags time including (Q(t−1), Q(t−1) Q(t−2), and Q(t−1) Q(t−2) Q(−n)) is mapped to L-dimension ELM random feature space, whereas the network output (forecasted Qt) can be expressed as:t

$$fL\left( x \right) = \mathop \sum \limits_{i = 1}^{L} h_{i} \beta_{i} = h_{\left( x \right)} \beta$$
(2)

where \(\beta = [\beta_{1} ,\beta_{2} , \ldots ,\beta_{l} ]^{T}\) represents the weight sandwiched between the hidden layer and output layer, whereas \(h_{(x)} = [g_{i(x)} , \ldots ,g_{l(x)} ]\) is the hidden weight output in which randomly generalization features for the input vectors. L is the number of the hidden neurons. g i(x) is the output of the ith hidden nodes. In this research, the development of the modeling conducted via sigmoid activation function, as best can be expressed:

$$g_{i} \left( x \right) = {\text{SigAct}} \left( {x,a_{i} ,b_{i} } \right) = \left( {1 + { \exp }\left( { - \left( {a_{i} x + b_{i} } \right)} \right) } \right)$$
(3)

where a i and b i are the random input weights and bias between input nodes and ith hidden nodes.

ELM model has the capability to resolve the learning problem \(H\beta = T\). Here, T = [T 1, …, T N] is the target output matrix and H = [h T(X 1), h T(X 2), h T(X 3),…., h T(X N)]T. The β presents the output weight which is determined using \(\beta = {\text{H}}^{\dag } T\) where H is the Moore–Penrose generalized inverse of matrix H.

While developing a stream-flow forecasting model, it is of importance to consider the implementation efficiency. The main point of the ELM approach is its potential to reduce the computational time for the training procedure noticeably because of its strong mathematical process capable to lessen the iterative and descent steps. Furthermore, the time needed for the calculation of the weights for the input variables and the desired output variable and their adjustment utilizing the least squares solution (linear system) which is computationally time consumption procedure. On the top of that, the approach of the ELM method employs the singular value decomposition (SVD) as adaptable and stabilization numerical technique for computing the Moore–Penrose generalization inverse. Finally, 30 nodes for the hidden layer are considered by utilizing the trials-and-error process in order to achieve a balance in the statistical evaluation metrics in both model phases (training and testing).

4 Case study and data preparation

In the present research, daily, weekly and monthly data stream-flow were used belonging to Johor River Basin, Malaysia. Figure 3 illustrates the location of the case study. The drainage area of the selected river is around 2640 km2 with total length 123 km. The observed inflow data are 20 years’ time period in which equal (7300 days), (1040 weeks) and (240 months), for the observation period (1989–2008). The historical data as shown in (Fig. 4) at Rantau Panjang station were obtained from the Department of Irrigation and Drainage (DID). The first month of the year is January, and the last month of the year is December. Data division for the training and testing phases was assigned 90 and 10%, respectively. This division of data set was assigned using trial-and-error procedure until the best performance forecasting accuracies level obtained [11, 67, 70, 71].

Fig. 3
figure 3

Location of the case study (Johor River Basin, Peninsular Malaysia)

Fig. 4
figure 4

Actual stream-flow records for Johor River Basin at Rantau Panjang station. a Daily. b Average weekly. c Average monthly

In modeling regression problem, for example, stream-flow in the present study, the selection of the appropriate lag times as an input variables is one of the essential priorities [6]. Autocorrelation function (ACF) and partial autocorrelation function (PACF) have been applied on the selected time series data, for the purpose to determine the most correlated lead time and to perform effective modeling. Figure 5 indicates the ACF and the PACF for all the interval of stream-flow (daily, average weekly and average monthly) with various lags time. According to Fig. 5 based on these two behaviors of the autocorrelation and partial autocorrelation; three lags, four lags and two lags were considered to forecast one step ahead for daily, weekly and monthly, respectively. Hence, in the current study, we have established varies models with various input combinations based on the time scales and lead times of stream-flows. The input combinations of all the time intervals are presented in Table 1. Another significant concern is the data preprocessing, the data were normalized for the purpose of regularity and balance data range. The data were normalized between (0–1) using the following formula:

$$x_{\text{new}} = \left( {x - x_{ \hbox{min} } } \right)/\left( {x_{ \hbox{max} } - x_{ \hbox{min} } } \right)$$
(4)

here, x value defines the actual records of the application. x min represents the minimum value of the data set, and x max represents the maximum value of the data set.

Fig. 5
figure 5

Plots of ACF and PACF of the stream-flow time series with 95% confidence bounds (the red lines), (a, b) daily, (c, d) average weekly, and (e, f) average monthly

Table 1 The input combinations for all the time horizons

5 Application and analysis

This section discussed the results of the application of ELM model for training and testing phases comparatively with ANN modeling. The modeling accuracy assessment is presented in terms of the error variation between the observed and the forecasted values. Throughout the study, several performance measures were used for the evaluation purposes [72, 73]. By referring to the established research in evaluation hydrological models, Legates and Mccabe (1999) stated in their research that “goodness-of-fit” which exhibits the regression coefficient and absolute error statistical measurements is advisable to inspect the degree of the accuracies [74]. Thus, in this research, coefficient determination (R 2), root-mean-square error (RMSE), mean absolute error (MAE) and relative error (RE) were used to evaluate the performance criteria of the propose approach. The formulas of the mentioned indicators can be expressed as:

$$R^{2} = \frac{{\mathop \sum \nolimits_{t = 1}^{n} \left[ {\left( { S_{o} - \bar{S}_{o} } \right) \left( { S_{p} - \bar{S}_{p} } \right)} \right] }}{{\sqrt {\mathop \sum \nolimits_{t = 1}^{n} \left( { S_{o} - \bar{S}_{o} } \right)^{2} \mathop \sum \nolimits_{t = 1}^{n} \left( { S_{p} - \bar{S}_{p} } \right)^{2} } }}$$
(5)
$${\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{t = 1}^{n} \left( { S_{o} - S_{p} } \right)^{2} }$$
(6)
$${\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{t = 1}^{n} \left| { S_{o} - S_{p} } \right|$$
(7)
$${\text{RE}} = \left[ {\frac{{ S_{o} - S_{p} }}{{S_{o} }}} \right]100$$
(8)

The definition of the formulas (5)–(8) variables is \(S_{o}\) (stream-flow observed records), \(S_{p}\) (stream-flow predicted records). \(\bar{S}_{o}\) and \(\bar{S}_{p}\) are the mean values. N is the number of the data set.

It is even worth to brief the formation structure of the modeling before proceeding with the discussion of the results. Since neural networks topology affects the complexity of the computational models and most importantly the level of the accuracies. Remarkably, RBFNN algorithm has been observed to be quite simple compared with the others (i.e. FFNN or MLP). The most significant parameters that should be obtained (as described in Sect. 2) are the spread values and the number of radial basis function. The spread values and the number of RBF are achieved by using trail-and-error procedure until the desired accuracies aim (MSE) is accomplished. This is for the reason that there is no general methodology or guideline to obtain them. The optimum spread values were established (0.35, 0.6, and 0.8) for daily, weekly, and monthly time scale, respectively, whereas the number of the radial basis function structure was found to be 30 for all the intervals.

After establishing the forecasting models, the performance statistics of the ANN and the ELM models was compared over the training and testing phases. Table 2 indicates the performance indicators assessment of ANN forecasting model including the three time horizons. The best RMSE, MAE and R 2 values were obtained for daily time series forecasting with three lags time. The RMSE, MAE and R 2 values are 7.894 m3/s, 0.311 m3/s, and 0.914 (or 11.63 m3/s, 0.4305 m3/s, and 0.9076) for training (or testing) phases, respectively, whereas Table 3 presents the proposed ELM approach, which indicates the best statistical evaluation measures for daily stream-flow forecasting as well. The RMSE, MAE and R 2 values are 2.372 m3/s, 0.084 m3/s, and 0.967 (or 2.7804 m3/s, 0.1029 m3/s, and 0.9422) for training (or testing) phases, respectively. However, the results of the proposed approach showed a noticeable enhancement for all the time horizons accuracy and most specifically for daily flow forecasting. Besides, the results indicate that the models training phase performance is better than the testing phase performance. Another remarkable observation, it was expected that according to the statistical methods (i.e., ACF and PACF) that employed to determine dimension of the input vectors combinations. Tables 2 and 3 exhibited the best performance criteria of the models with the domain that determined in advanced. In addition, the best evaluation measures including RMSE, MAE and R 2 were obtained within three antecedents’ values for daily time series, two antecedents’ values for both weekly and monthly time scale. This observation indicated for both modeling ELM and ANN approaches. Another important observation is the time consuming for the testing period “validation phase”. It can be seen a noticeable speed execution in comparison between ELM and ANN models. This remark was reported by [53] that the elapsed time using ELM modeling moderately fast in accordance with its tuning-free mechanism.

Table 2 ANN approach performances evaluation criteria for different time scale “daily, weekly and monthly”
Table 3 ELM approach performances evaluation criteria for different time scale “daily, weekly and monthly”

For better visualization of the performance accuracy, the forecasted stream-flow by ANN and ELM models are compared by presentable graph (see Fig. 6) with the observed data records. Figure 6 indicates the testing period (2007–2008), which presents the 10% of the whole time series (as mentioned earlier in Sect. 4). Both ANN and ELM forecasts show generally good agreement with the observed stream-flow in this study area, despite for some peak flow events, the two models did not perform very well.

Fig. 6
figure 6

Comparison between observed and forecasted stream-flow for one-step-ahead (testing phase) using ANN and ELM methods. a Daily. b Average weekly. c Average monthly

To present the reliability and the effectiveness of the ELM model, we compute the relative error (formula 10) for the extreme events of peak flow for all the intervals. Table 4 shows the peak flow forecasting values for the all time scale using both models (ANN and ELM). From this table, the accuracy of the ELM seems to be better than ANN. The maximum daily peak flow is 488.327 m3/s instead of actual record 536.358 m3/s, with an underestimation of 8.955%, while the ANN results is 460.643 m3/s, with an underestimation 14.116%. The ELM forecasting of the maximum average weekly flow, 286.7233 m3/s is 233.47 m3/s, with an underestimation error of 18.573%, while the ANN model yields is 225.347 m3/s, with an underestimation of 21.406%. Finally, the maximum average monthly flow was underestimated by 7.285 and 21.316% regarding ELM and ANN models, respectively. Further observations from the obtained results, it can be seen that ELM model seems to perform better than ANN model for the all interval time series and accordingly displaying a better performance relatively.

Table 4 The comparison of ANN and ELM relative error indicator for the peak flows for the testing phase Johor River, Malaysia

Figure 7 shows the results of the scatter diagrams for one-step-ahead stream-flow forecasting using ANN and ELM models for the testing phase. The figure presents the three time horizons forecasted models.

Fig. 7
figure 7

The observed versus the simulated stream-flow for the testing phase using ANN and ELM models, (a, b) daily, (c, d) average weekly, and (e, f) average monthly, respectively

Authors explored a deep and comprehensive detailed analysis between ANN and ELM models, the relative errors distribution (RE, Eq. 10) have been studied over the testing data phase period. Figure 8 demonstrates the accomplished results for the three time scales stream-flow forecasting using ANN and ELM models. For daily basis, as confirmed in Fig. 8b, the residual error was strongly enhanced over the testing phase comparatively with ANN modeling. The maximum value of RE was decreased up to (30%). For weekly and monthly basis, Fig. 8d, f indicates the distribution of the RE, if we carefully examine those figures, it could be noticed that the pattern of the error using ELM model is similar to the ANN model, but the relative error value is relatively improved.

Fig. 8
figure 8

Relative error distribution for the testing phase using ANN and ELM methods, (a, b) daily, (c, d) average weekly, and (e, f) average monthly, respectively

In the light of the above discussion, it could be remarked in general, and the performance of the daily stream-flow forecasting is outperforming the other time horizons (weekly and monthly). This is due to the sufficiency of historical time series records on the daily basis which provides more information of the nature phenomena of the flow. Thus, the modeling could capture most of the nonlinearity of the stream-flow patterns, which provide lower forecasting error. In addition, long-term stream-flow was influenced by several unsystematic hydrological variables that cause an uncertainty in the time series modeling. Furthermore, 123 km long river is short time traveling river flow; hence, modeling weekly or monthly stream-flow is not accurately performed with the non-existence exogenous predictors (e.g., rainfall, humidity, wind speed, temperature and etc.). Finally, the results of the application exhibited very well harmony of goodness in comparison with most recently researches conducted using non-tuned machine learning approach (i.e., ELM) [6164].

6 Conclusion and future research

In this study, the accuracy of ELM model has been investigated for forecasting one-step-ahead short-term and long-term stream-flow in tropical environment Johor River, Malaysia. According to the statistical measures (R 2, RMSE, MAE, and RE) that have been carried out to evaluate the forecasting model, authors conclude that the proposed ELM approach is outperformed the ANN approach. This is much agreeable with several researches that have been conducted in the literature and that ELM approach can yield a much better performance in comparison with the existing predictive models in stream-flow forecasting. In addition, this investigation establishes modern methodology that offers a very optimistic and positive alternative for the hydrological applications.

Future research efforts should be devoted:

  1. 1.

    Model development that involves the data preprocessing utilizing wavelet transfer [75] or fast orthogonal research [76] might be applied and examine on ELM method for accuracy improvement purposes.

  2. 2.

    Seeking for new computational method as an alternative to compute the Moore–Penrose generalization inverse could be a good step to further research in this field. However, complete orthogonal decomposition method (COD) that proposed by [77], which is characterized by frivolous and reliable alternative to SVD, might give a promising to improve the computation efficiency of ELM approach.

  3. 3.

    External variables that have a correlated or even causal relationship with the stream-flow time series might have an essential influence to improve the accuracy of the modeling. For instant, climatological data (e.g., rainfall, humidity, and weather temperature) need to be investigate as an inputs parameter to predict stream-flow.