Introduction

The inflow forecasting is one of the most active research areas in surface water hydrology. Currently, there have been a great number of relevant studies, and many methods and models could be used to perform inflow forecasting (e.g., Fernando and Jayawardena 1998; Toth et al. 1999; Shamseldin and O’Connor 2001; Xiong and O’Connor 2002; Moradkhani et al. 2004; Goswami et al. 2005; Kentel 2009; Kagoda et al. 2010; Jothiprakas and Maga 2012; Liu et al. 2012; Danandeh et al. 2014; Terzi and Ergin 2014; Akrami et al. 2014; Yaseen et al. 2015; Afan et al. 2020).

Inflow forecasting is a need for an adequate water management and reservoir operation. Currently, the Brazilian National System Operator (ONS), which is responsible for the operation of the hydroelectric power plants reservoirs in Brazil, uses stochastic models to subsidize their work. However, these models have limited precision and, therefore, it is necessary to develop more efficient tools to plan and operate such a system (Hidalgo et al. 2012; Santos and Silva 2014; Freire et al. 2019).

Artificial neural networks (ANN) have already shown good results in inflow forecasting (Karunanithi et al. 1994; Campolo et al. 1999; Danh et al. 1999; Lauzon et al. 2000; Sivakumar et al. 2002; Cigizoglu 2003; Kumar et al. 2004; Cigizoglu and Kisi 2005; Cheng et al. 2005; Farias et al. 2013; Farias and Santos 2014). Therefore, they could be used as an alternative to stochastic models, or in conjunction with other models to improve the operation of the interconnected systems.

Recently, the ANNs forecasting results have been improved by pre-processing the input data through some type of signal filter such as wavelet transform, which transforms the raw input time series into high frequency (details) and low frequency (approximation) components. The recent studies show that the use of signal pre-processed by wavelet transform improves the results obtained by the regular ANN in the inflows forecasting (Cannas et al. 2005; Kisi 2009; Adamowski and Sun 2010; Pramanik et al. 2010; Krishna et al. 2011; Tiwari and Chatterjee 2011; Krishna 2013; Maheswaran and Khosa 2013; Wei et al. 2013; Santos et al. 2014, 2019; Honorato et al. 2018).

The main objective here is to investigate the influence of the raw signal decomposition level choice on wavelet-based daily inflow forecasting. Thus, this paper describes the method for discrete wavelet decomposition of time series, the daily inflow time series registered in three different reservoirs used as study cases, and the forecasting results using WANN models with inputs of different decomposition levels.

Material

Study area

The selected reservoirs are Sobradinho, 14 de Julho, and Itaipu (Fig. 1). Sobradinho Reservoir is in the São Francisco River, in north-eastern Brazil, in the Bahia State, the 14 de Julho Reservoir is in the Antas River, Cotiporã city, Rio Grande do Sul State, southern Brazil, and Itaipu Reservoir is in the Paraná River located on the border between Brazil and Paraguay.

Fig. 1
figure 1

Location of the selected reservoirs

Sobradinho has a hydroelectric plant that is located in the São Francisco River at 748 km far from its mouth, with a drainage area of 498,968 km2. This reservoir has the second largest artificial lake in the world, with about 320 km long, a water surface of 4214 km2 and a storage capacity of 34.1 billion m3 at the depth of 392.50 m. This lake is the largest water reservoir of north-eastern Brazil, and through the São Francisco Hydroelectric Company, it regulates the São Francisco River downstream inflow. The 14 de Julho Hydroelectric Plant has a generation capacity of 100 MW, with a maximum height of 33.5 m and a flooded area of 6 km2. Itaipu has an installed generation capacity of 14 GW, with 20 generating units providing 700 MW each with a hydraulic design head of 118 m. This lake is the seventh largest in Brazil (1350 km2), but has the best rate of use of water to produce energy among the largest Brazilian reservoirs.

Inflow data

The data used in this paper correspond to the natural daily inflow into those reservoirs, for the period from 1 January 1931 to 31 December 2010, which were obtained from the ONS, which is also responsible for developing forecasting and scenario generation of average daily natural flow, weekly and monthly to all hydroelectric development sites in Brazil.

Figure 2 shows the daily hydrograph of the inflows into the studied reservoirs. Sobradinho Reservoir, with an average inflow of 2656 m3/s, a maximum of 18,525 m3/s and a minimum of 400 m3/s. The 14 de Julho Reservoir, with an average inflow of 285 m3/s, a maximum of 6912 m3/s and a minimum of 2 m3/s. Itaipu Reservoir, with an average inflow of 10,209 m3/s, a maximum of 42,322 m3/s and a minimum of 2512 m3/s. These data comprehend 80 years (29,220 days) measured from the 1931 to 2010. The first 77 years of the inflow data (28,124 days, 96% of the whole data set) were used for the calibration, which was divided into three sets for the ANN training, validation and testing. The remaining three years (1096 days, 4% of the whole data set) was used for the final test. The statistical indices of these data sets are presented in Table 1 for each period.

Fig. 2
figure 2

Daily hydrograph of the three studied reservoirs: (a) Sobradinho, (b) 14 de Julho and (c) Itaipu (1931–2010)

Table 1 Descriptive statistics for the daily inflow data used in the study

Methods

Discrete wavelet transform (DWT)

The DWT is generally used in the decomposition and filtering of time series (Wang and Ding 2003; Ravansalar et al. 2015), because it does not cause coefficient redundancies between the scales, and the information about the time location of certain events is not lost in the process (Daubechies 1990; Alessio 2016).

For the calculation of DWT, the simplest and most efficient method was introduced by Mallat (1989), in which the scale and position parameters are chosen based on power of 2. This simple algorithm turns the DWT function a bypass filter, by calculating quickly the wavelet coefficients and thus decomposing the input signal into low and high frequency components (Misiti et al. 1996). The approximations correspond to the low frequency components and represent the general behavior of the series, whereas the details correspond to the high frequency components and could be understood as the noises present in the series, depending on the level of decomposition (Santos et al. 2013, 2019; Freire et al. 2019).

The decomposition could continue in an iterative process, with the approximations being decomposed in turn; then, the original signal is broken down into several lower resolution components. This process is called wavelet decomposition tree as illustrated in Fig. 3. Thus, the maximum number of decompositions (lmáx) of each wavelet subfamily for the studied reservoirs was determined according to criterions: (a) the size of the series – known as criterion on the signal; and (b) the wavelet subfamily used – known as entropy criterion (Misiti et al. 2006):

$$ {l}_{m\overset{\acute{\mkern6mu}}{\mathrm{a}}x}=\frac{\log \left(\frac{l_x}{\left({l}_w-1\right)}\right)}{\log 2}\kern1.75em $$
(1)

where lx is the size of the time series (lx = 29.220) and lw is the filter size associated with the orthogonal or biorthogonal wavelet.

Fig. 3
figure 3

Multiple decomposition up to the chosen maximum level of the Sobradinho Reservoir inflow time series using one selected mother-wavelet: approximations in the left, and details in the right

The value of lw ranged from 2 to 102, depending on the subfamily used, then the calculated lmáx ranged from 8 to 14. Thus, the value 8 was chosen to be the maximum level of signal decomposition, because this value caters for all studied families. For example, Fig. 3 shows the approximations (low frequency) and the details (high frequency), both in eight levels of decomposition for the data of daily natural inflows to Sobradinho Reservoir. It is notorious by a simple visual checking that approximations above level eight would be distant from the hydrograph form of the raw signal.

Artificial neural network (ANN)

In this paper, the ANN inputs were formed by the inflow observed in the current day t (Qt) and in the previous four days (Qt-1, Qt-2, Qt-3 and Qt-4); thus, the input layer had five neurons. As for the WANN, the input was formed by the signal approximation of such raw signal. The output layer of both models had only one neuron, which corresponds to the forecasted inflow seven days ahead (Qt + 7).

The architectures of ANN and proposed WANN were a feed-forward network, with 20 hidden neurons whose activation function was the sigmoid (hidden layer) and a linear function for the output layer. The Levenberg-Marquardt algorithm was used as the learning algorithm, because it is considered one of the fastest methods for training (Renno et al. 2015). The error verification on the training data set in the ANN and WANN was done by calculating the mean square error (MSE):

$$ MSE=\frac{1}{N}\sum \limits_{t=1}^N{\left({Q}_{c_t}-{Q}_{o_t}\right)}^2 $$
(2)

where Qct and Qot are respectively the calculated and observed inflows at time t.

Performance evaluation

In the present paper, three statistical indices were used to evaluate the accuracy of the forecasting results: (a) the root mean square error (RMSE), (b) the Nash-Sutcliffe coefficient (NASH) and (c) the correlation coefficient (R):

$$ RMSE=\sqrt{\frac{1}{N}\sum {\left({Q}_c-{Q}_o\right)}^2} $$
(3)
$$ NASH=1-\frac{\sum {\left({Q}_o-{Q}_c\right)}^2}{\sum {\left({Q}_o-\overline{Q_o}\right)}^2} $$
(4)
$$ R=\frac{\sum \left({Q}_o-\overline{Q_o}\right)\left({Q}_c-\overline{Q_c}\right)}{\sqrt{\sum {\left({Q}_o-\overline{Q_o}\right)}^2}\sqrt{\sum {\left({Q}_c-\overline{Q_c}\right)}^2}} $$
(5)

where Qc is the calculated inflow; Qo is the observed inflow; \( \overline{Q_c} \) is the mean calculated inflow; and \( \overline{Q_o} \) is the mean observed inflow.

The root mean square error is the square root of the mean square error (MSE), whose optimal value is RMSE = 0. The Nash-Sutcliffe coefficient is considered one of the most important statistical criteria to evaluate the accuracy of hydrological models, which can range from –∞ to 1, whose optimal value is NASH = 1. The correlation coefficient can range from −1 to 1 and indicates the degree of collinearity between forecasted and observed values; if R = 1, a perfect positive linear relationship exists.

Results and discussion

In order to improve the ANN efficiency, the input and output data were normalized and, then, scaled before training, in the interval [−1, 1] (Demuth and Beale 2005). For each simulation, 70% of the original data were used in the training, 15% for validation and 15% for testing. The forecasted results using the ANN model presented RMSE = 457.5096 m3/s, NASH = 0.8967, and R = 0.9479, for Sobradinho Reservoir; RMSE = 511.7921 m3/s, NASH = 0.0658, and R = 0.2624, for 14 de Julho Reservoir; and RMSE = 2299.3335 m3/s, NASH = 0.7844, and R = 0.8862, for Itaipu Reservoir.

In order to improve the forecasting efficiency, as aforementioned, the raw signal was pre-processed using 54 wavelet subfamilies to break it down into approximations and details up to the maximum level set (i.e. 8). Then, the approximations were used as ANN inputs to forecast the inflows seven days ahead, totalling 432 WANN models (i.e., 54 × 8), for each reservoir.

Figure 4 shows the performance of the WANN models, based on RMSE, in which the dots below the red line means that the WANN models were successful in relation to the regular ANN model, whereas the dots above the red line means that the WANN models were not successful. It is observed that with the use of the approximations A1 to A5 as inputs, the RMSE decreases in many cases, independently on the studied reservoir. After that, the error substantially increases, and one may note that it is not worthwhile to use the A6, A7, A8 or even higher approximations, because those approximations were not able to provide any forecasting improvement. Exceptionally, for reservoirs with small inflow volumes (e.g., 14 de Julho Reservoir), the A6 and A7 approximations could be used. The same is observed when analysing the NASH (Fig. 5) and R (Fig. 6) indices. In both figures, the dots above the red line means again that the WANN models were successful in relation to the regular ANN model and the dots below the red line means that they were not successful in relation to the ANN model. Then, it is confirmed that the use of A6, A7, A8 or higher approximations is not indicated to be used as ANN inputs, because such a procedure would not improve the performance of the WANN models. The A6 and A7 approximations could be used exceptionally for inflow time series composed by small volumes.

Fig. 4
figure 4

Performance of the WANN models based on RMSE index

Fig. 5
figure 5

Performance of the WANN models based on NASH index

Fig. 6
figure 6

Performance of the WANN models based on R index

Figure 7 shows the quantitative success of WANN models in relation to ANN model, according to the results shown in Figs. 4, 5 and 6. From the quantitative point of view, the best approximation was A4, which obtained 100.00% success for all analysed indices for 14 de Julho Reservoir and Itaipu Reservoir. For the Sobradinho Reservoir, the best approximation was A1, which obtained 92.59% success for all indices, as 54 wavelet subfamilies were evaluated and this approximation was successful in 50 subfamilies. The second-best approximation for Sobradinho Reservoir was A3 with 87.04% success for all indices, followed by A4 and A2 with 83.33% and 77.78% success, respectively. Finally, A5 presented 66.67% success, whereas the other approximations (A6, A7, A8) did not have success as stated earlier. For the 14 de Julho Reservoir, the second-best approximation was A3 with 98.15% success for all indices, followed by A5 with 96.30% success for RMSE and NASH indices and 100.00% success for the R index, followed by A7 and A6 with 96.30% and 94.44% success for all indices, respectively. Finally, A2, A1 and A8 presented 74.07%, 38.89% and 14.81% success for RMSE and NASH indices and 87.04%, 31.48% and 16.67% success for the R index, respectively. For the Itaipu Reservoir, the second-best approximation was A5 with 98.15% success for all indices, followed by A3 and A2 with 88.89% and 83.33% success, respectively. Finally, A6 and A1 presented 70.37% and 66.67% success, respectively, whereas the other approximations (A7, A8) did not have success as aforementioned.

Fig. 7
figure 7

Quantitative success of WANN models against the ANN model, based on the performance indices (RMSE, NASH and R) for each reservoir (a) Sobradinho, (b) 14 de Julho and (c) Itaipu

However, when analysing the values of these indices, it was observed that (a) for the Sobradinho Reservoir, the RMSE ranged from 400.0489 to 473.3114 m3/s for A1, from 302.5705 to 504.5766 m3/s for A2, from 94.3232 to 537.4780 m3/s for A3, from 203.4392 to 579.0129 m3/s for A4 and from 371.0106 to 642.2430 m3/s for A5; (b) for the 14 de Julho Reservoir, the RMSE ranged from 506.9577 to 519.2857 m3/s for A1, from 350.1133 to 543.8587 m3/s for A2, from 347.3893 to 564.1586 m3/s for A3, from 378.6133 to 507.4274 m3/s for A4 and from 429.7828 to 562.1094 m3/s for A5; and (c) for the Itaipu Reservoir, the RMSE ranged from 1825.7296 to 2396.7418 m3/s for A1, from 1555.2176 to 2491.0735 m3/s for A2, from 728.4548 to 2380.0655 m3/s for A3, from 1229.3971 to 2114.6630 m3/s for A4 and from 1771.7477 to 2323.0118 m3/s for A5, as can be seen in Table 2. Table 2 also shows the variation of all performance indices for all ANN input configurations. By analyzing Table 2, it could be noted that (a) for the Sobradinho Reservoir, the A3 approximation improved the forecasting by up to 79% decrease in RMSE, 11% increase in NASH and 5% increase in R, whereas the A4 approximation improved the forecasting by up to 56% decrease in RMSE, 9% increase in NASH and 4% increase in R; while A1 improved only RMSE by 13%, NASH by 3% and R by 1%; (b) for the 14 de Julho Reservoir, the A3 approximation improved the forecasting by up to 32% decrease in RMSE, 766% increase in NASH and 189% increase in R, whereas the A2 approximation improved the forecasting by up to 32% decrease in RMSE, 755% increase in NASH and 186% increase in R; while A4 improved only RMSE by 26%, NASH by 643% and R by 167%; and (c) for the Itaipu Reservoir, the A3 approximation improved the forecasting by up to 68% decrease in RMSE, 25% increase in NASH and 12% increase in R, while A4 improved only RMSE by 47%, NASH by 20% and R by 9%.

Table 2 Variation of the performance indices for the ANN (raw data) and WANN (A1 to A8) models for each reservoir

Thus, a qualitative analysis of the successes of WANN models in relation to the ANN model is necessary and can be observed in Table 3. The negative sign in the RMSE column of Table 3 means an improvement in such index; then, the best value is close to 0.0. On the other hand, the plus sign in the NASH and R columns shows an improvement of such indices, and the best values are close to 1.0. From the qualitative point of view, the best approximation was A3, for all analysed indices of each analysed reservoir, which showed an improvement range from 0.06 to 79.38% for RMSE, from 0.01 to 11.03% for NASH and from 0.06 to 5.26% for R in the Sobradinho Reservoir; for the 14 de Julho Reservoir this approximation showed an improvement range from 1.37 to 32.12% for RMSE, from 38.76 to 765.60% for NASH and from 30.37 to 189.39% for R; for the Itaipu Reservoir the improvement range of this approximation was from 6.13 to 68.32% for RMSE, from 3.27 to 24.72% for NASH and from 1.80 to 11.62% for R. The second-best approximation was A4 for Sobradinho and Itaipu reservoirs with an improvement range from 6.13 to 55.53% and from 8.03 to 46.53% for RMSE, from 1.37 to 9.24% and from 4.24 to 19.62% for NASH and from 0.66 to 4.41% and from 2.20 to 9.31% for R, respectively. For the 14 de Julho Reservoir, the second-best approximation was A2 with an improvement range from 0.08 to 31.59% for RMSE, from 2.29 to 755.31% for NASH and from 0.44 to 186.06% for R. The A4 approximation ranked third for the 14 de Julho reservoir with an improvement range from 0.85 to 26.02% for RMSE, from 24.11 to 642.74% for NASH and from 34.20 to 166.77% for R and the A1 approximation ranked fifth for the Sobradinho reservoir with an improvement range from 0.46 to 12.56% for RMSE, from 0.11 to 2.71% for NASH and from 0.02 to 0.83% for R.

Table 3 Percentage of improvement of the forecasting performance using the WANN models compared to the ANN model

Conclusions

The use of the wavelet transform to eliminate the noise presented in the raw signal showed to be extremely important to improve the ANN forecast performance; i.e., the WANN models performed significantly better than the ANN model to forecast the Sobradinho, the 14 de Julho and the Itaipu reservoir inflows seven days ahead.

Seven wavelet families were analysed, for which the maximum decomposition level of each wavelet subfamily was calculated for the inflow data of the Sobradinho Reservoir, the 14 de Julho Reservoir and the Itaipu Reservoir based on the signal criterion (size of the series) and entropy criterion (wavelet subfamily). Thus, the maximum level of decomposition was chosen equal to eight, because such decomposition caters for all the studied subfamilies.

A total of 432 WANN models were tested against a regular ANN for each reservoir, and it was observed that the best forecastings of the WANN models were for the approximations between level A1 and A5, from which the A4 approximation was the most successful, followed by the A3 approximation for 14 de Julho Reservoir and the Itaipu Reservoir. Although for the Sobradinho Reservoir, the A1 approximation obtained the highest amount of success followed by the A3 approximation. The A3 approximation was chosen as the best approximation to be used as ANN inputs, because such an approximation provided the best forecasting results for all reservoirs: (a) with RMSE ranging from 94.3232 to 537.4780 m3/s, NASH ranging from 0.8575 to 0.9956 and R ranging from 0.9271 to 0.9978 for Sobradinho Reservoir; (b) with RMSE ranging from 347.3893 to 564.1586 m3/s, NASH ranging from 0.1352 to 0.5696 and R ranging from 0.2217 to 0.7593 for the 14 de Julho Reservoir; and (c) with RMSE ranging from 728.4548 to 2380.0655 m3/s, NASH ranging from 0.7690 to 0.9784 and R ranging from 0.8784 to 0.9893 for Itaipu Reservoir, while indices for A1 approximation ranged from 400.0489 to 473.3114 m3/s for RMSE, from 0.8895 to 0.9210 for NASH, and 0.9444 to 0.9609 for R for Sobradinho Reservoir and the indices for A4 approximation ranged from 378.6133 to 507.4274 m3/s and from 1229.3971 to 2114.6630 m3/s for RMSE, from 0.0817 to 0.4887 and from 0.8177 to 0.9384 for NASH, and from 0.3521 to 0.7000 and from 0.9058 to 0.9687 for R for the 14 de Julho and Itaipu reservoirs, respectively.

Finally, it can be concluded that by decomposing a daily inflow time series up to the fifth level and using the A5 approximation as ANN inputs, i.e., eliminating the D1, D2, D3, D4 and D5 details, it is possible to often obtain better forecasting results than using the raw data as input data (regular ANN model). However, if the D1, D2 and D3 details could be assumed as noise of the raw signal, then the A3 approximation could be used as ANN inputs, and such procedure would provide the best WANN forecasting results, regardless of the river discharge patterns and the chosen mother-wavelet or wavelet subfamily.