Introduction

Precipitation estimates are essential for the management of water resources, as well as for creating sustainability strategies for these resources for extremely varied applications, such as agriculture, industry, water supply, energy production (hydroelectricity) and waterway transport, especially in extreme weather conditions. However, according to Michot et al. (2019), practical and accurate forecasts may encounter barriers related to the quality of the data (gaps and failures), size of the historical series and availability of the number of rainfall stations. Thus, the use of effective methods for estimating precipitation is essential. Artificial intelligence (AI) methods are potentially useful approaches to simulate precipitation (Fahimi et al., 2017; Nourani et al., 2014). According to Sulaiman et al. (2018), this utility is due to the remarkable flexibility of AI methods in modelling highly nonlinear systems and stochastic patterns, and these methods do not require prior knowledge of the behaviour of measurement processes.

According to Shoaib et al. (2016), AI methods, for example, artificial neural networks, (ANNs) are able to establish a relationship between historical inputs (precipitation, streamflow, water levels, etc.) and the desired outputs. Such work is carried out through a nonlinear function composed of several factors that are adjusted to the observed data, allowing its prediction, as adopted by Jiménez and Collischonn (2015), Santos et al. (2016), Nourani et al. (2017), Shoaib et al. (2018), Honorato et al. (2018) and Mendonça et al. (2021). ANNs are widely used methods in predicting hydrological variables; however, a single ANN model may not be able to deal with the nonstationary behaviour of time series if the input is not pre-processed (Cannas et al., 2006; Hu et al., 2018; Islam & Sivakumar, 2002). In this sense, wavelet transformation (WT) is a pre-processing methodology capable of filtering and correcting the information contained in the time series of input data (Zeri et al., 2018). According to Nourani et al. (2014), this correction in the inputs considerably favours the efficiency of ANN models in predicting hydrological variables. He et al. (2015) combined feed forward backpropagation WT and ANN to forecast monthly rainfall precipitation in the Australian territory, concluding that the combined model performed better in forecasting when compared to other models. Partal et al. (2015) obtained good results by combining with three types of ANNs (feedback propagation, radial basis function and generalized regression neural network) for daily precipitation prediction.

It is possible to develop these ANN-based prediction models and combine pre-processing tools using a number of variables, such as temperature, radiation and humidity, as inputs. However, few stations are equipped with resources to measure these variables, especially in developing countries, due to economic and technical reasons (Altunkaynak & Nigussie, 2015). Therefore, it is advisable to develop a model that can simulate daily precipitation based on previous records of its historical series. Furthermore, the amount of minimal input data, which function as a memory in ANN models and act on network learning, is still a matter of concern and needs to be investigated, as demonstrated by Shoaib et al. (2016). Hu et al. (2018) inserted an ANN into LSTM models to simulate the rainfall-runoff process based on flood events from 1971 to 2013 in the Chinese Fen River basin, obtaining satisfactory results with the use of LSTM. Salman et al. (2018) built an LSTM model with an ANN to predict meteorological variables at the Hang Nadim airport in Indonesia, demonstrating that several input layers with different time delays improve the prediction of observed variables. Hammad et al. (2021) developed a new wavelet-coupled multiple order time delay (WMTLNN) ANN model for rainfall prediction in Indus basins, Pakistan. They found that the different inputs with time delays and wavelet pre-processing improved the precipitation forecast in the evaluated basins.

Thus, daily precipitation was estimated through a hybrid model based on a new concept of introduction of several layers of time delay and pre-processed by maximum overlap discrete wavelet (MODWT) via neural networks adaptive neuro-fuzzy inference system (ANFIS). The ANFIS network has been combined with other techniques and has stood out among neural networks due to its good performance in predicting hydrological variables, especially when compared with other models (Choubin et al., 2016; Ahmadlou et al., 2019; Pham et al. 2020; Ebrahimi-Khusfi et al., 2021). The MODWT-ANFIS model was applied to the Amazon basin, which depends on precipitation to sustain its economic activities, in addition to influencing regional and global atmospheric circulation. Precipitation data observed by the National Water Agency (ANA) and Satellite of the Morphing Climate Prediction Center (CMORPH) were adopted. In this case, the models can be applied even in the absence of monitoring by specific precipitation stations.

Material and methods

Study area and database

The Amazon area is approximately 5,015,067.75 km2, corresponding to approximately 58.9% of the Brazilian territory (IBGE, 2010). The region has an extensive and dense hydrographic network formed by the largest river in the world, the Amazon, with a length of 6,400 km, of which approximately 3,220 km is within Brazil. Including discharges from its various tributaries, the Amazon River is responsible for 60% of Brazil’s water availability and approximately 20% of the flow of all freshwater in the world (Davidson et al., 2012). According to data from Mapbiomas (2016), the Amazon has three characteristic biomes: (i) the Amazon biome (AB), which is the most representative, occupying 83.86% of the region; (ii) the Cerrado biome (CB), located to the east (E) and southeast (SE), corresponding to 14.32%; and (iii) the Pantanal biome (PB), located to the southwest (SW), representing only 1.82% of the total area (Fig. 1). In addition, in these biomes, there are transition areas: the Amazon-Cerrado (EAC) Ecotone is the largest in length, approximately 6,240 km, extending from SE to SW of the region, and the Amazon-Pantanal (EAP) and Amazon Ecotones Pantanal-Cerrado (EAPC) (Fig. 1).

Fig. 1
figure 1

Amazon and rainfall gauge station locations

In the context of regional circulation, the forest plays an important role as a source of moisture generation for other regions of Brazil (midwest, southeast and south) and for the South American (SA) continent (Ciemer et al., 2018; Silveira et al., 2017). The Amazon deforested area is 15.19% of the total area, concentrated on the southern and eastern edges of the region, known as the “arc of deforestation” (Fig. 1). This deforestation process is mainly caused by the replacement of forest cover by livestock, agricultural and agro-industrial activities (Lima et al., 2019; Vale et al., 2019).

The temporal series of six rainfall stations (Table 1 and Fig. 1) monitored by the ANA (available at http://www.snirh.gov.br) were used. Daily precipitation data for the CMORPH product were obtained for each location of the rainfall stations. The choice of stations prioritized series with minimal gaps (average of 0.1% of the total observed data), and the period observed was 19 years (1998–2016). Precipitation from stations stored by ANA is punctual and recorded every 24 h. The information produced by CMORPH has a spatial resolution of 8 km (at the Equator) and is recorded every 30 min. These differences motivated the use of two databases, in addition to the possibility of replacing data, in the absence of punctual monitoring, which is common in some places in the Amazon.

Table 1 Data from ANA rainfall stations and average daily rainfall

Maximum overlap discrete wavelet transform

For Daubechies (1992), the central idea of WT is the decomposition of the signal at different time scales as a set of basic functions (mother wavelet), revealing information from the original data, such as trends, disintegration points and discontinuities, which the raw signal does not expose (Holdefer & Severo, 2015; Zeri et al., 2018). The WT is divided into two types: continuous wavelet transform (CWT) and discrete wavelet transform (DWT) (Addison et al., 2001; Daubechies, 1992); however, as hydrometeorological data are usually recorded at discrete time intervals, the DWT is preferentially adopted in the hydrological decomposition of time series (Mehr et al., 2014; Ramana et al., 2013). Among the existing TWs, the maximum overlap discrete wavelet transform (MODWT) has stood out in the use of time series decompositions. This is due to its potential to consider boundary conditions (BC) that involve data decomposition, thus avoiding errors that may be introduced throughout the development of the proposed forecasting model. Bašta (2014), Quilty et al. (2016) and Du et al. (2017) demonstrated how BCs influence the decomposition of time series and how they can produce incorrect predictions if not properly treated.

The MODWT definition is derived from the DWT definition, where \({(h}_{j,k})\) is the DWT filter and \({(g}_{j,k})\) is the scale filter, with k = 1…, representing the filter length (L), with j levels of decomposition. The MODWT wavelet filter \(({\widehat{h}}_{j,k})\) and the MODWT scale filter \(({\widehat{g}}_{j,k})\) are defined as \({\tilde{h }}_{j,k}=^{{h}_{j,k}}\big/_{{2}^{j/2}}\) and \({\tilde{g}}_{j,k}=^{{g}_{j,k}}\big/_{{2}^{j/2}}\). Thus, the j-level MODWT wavelet coefficients are defined as the time series convolution (Xt), and the MODWT filters are obtained by Eqs. (1) and (2).

$${\tilde{W }}_{j,t}=\sum_{k=0}^{{k}_{j}-1}{\tilde{h }}_{j,k}{X}_{t-k modN}$$
(1)
$${\tilde{V }}_{j,t}=\sum_{k=0}^{{k}_{j}-1}{\tilde{g }}_{j,k}{X}_{t-k modN}$$
(2)

where \({\tilde{W }}_{j,t}\) is the wavelet coefficient; \({\tilde{V }}_{j,t}\) is the scale coefficient; modN is the operation module when treating the historical series as periodic, with periods equal to N; and \({K}_{j}\) can be obtained by Eq. (3).

$${K}_{j}=\left({2}^{j}-1\right)\left(K-1\right)+1$$
(3)

The value of \({K}_{j}\) represents the number of wavelet coefficients and scales affected by BC for the decomposition level J and the length level of the wavelet filter K. Thus, using this equation, it is possible to obtain wavelet and scale coefficients that have been “corrected by limits”, that is, values that avoid the introduction of additionally uncertainty to the wavelets and scale coefficients due to the problem of “future data” (Bašta, 2014).

MODWT uses a high pass filter \((\tilde{h })\) to calculate its wavelet coefficients and applies an iterative construction of the time series (Xt), which can be reconstructed using Eq. (4).

$${X}_{t}={\tilde{W }}_{j,t}+{\tilde{V }}_{j,t}$$
(4)

In practice, MODWT decomposition is performed on a series of data, for which the type of filter (wavelet), the level of decomposition and the limit, can be periodic or reflective, are selected. If periodic, the resulting wavelet and scale coefficients are calculated without duplicating the original series, treating (Xt) as if it were circular. If it is reflection, a new series is reflected twice the length of the original series. In the present study, the periodic limit was adopted, and three types of wavelet families, Daubechies (db4) of levels 6 and 8 (db4-j6 and db4-j8), less asymmetrical (la14) of levels 4 and 6 (la14-j4 and la14-j6) and coiflet (c6) of levels 4 and 6 (c6-j4 and c6-j6), were selected based on the most common hydrological data series (Maheswaran & Khosa, 2012; Santos et al., 2019) and by carrying out diversified decompositions.

Artificial neural network

ANNs are computational models that imitate the functioning of the human brain, with the aim of analysing a given system and reproducing it. The learning of an ANN occurs through an iterative process applied to synaptic weights (wkn) and bias (bk), called training. According to Haykin (2007), the training of an ANN is performed by an algorithm, which adjusts a matrix of synaptic weights. Thus, the output vector must match a desired target value for each input vector. This process is cyclical for the training sample set until a previously stipulated stopping criterion is reached. After training, it is expected that the ANN will be able to generalize information, obtaining coherent outputs with input vectors not used in the training set. It is also expected that the minimum error found in training will be similar to the error in simulation in an entirely different set.

The main architectures of artificial neural networks can be divided into single layer feedforward networks, multilayer feedforward networks, recurrent networks and reticulated networks. The difference between them is related to the arrangement of their neurons, their way of interconnection and the constitution of their layers, as mentioned above. In this study, the ANFIS network was used.

ANFIS network

ANFIS is a neural network that combines the fuzzy inference system (SIF) with an ANN. ANFIS is considered a fuzzy inference system organized in the form of an adaptive network capable of mapping input and output data based on the knowledge of an expert. The adaptive network is a multilayer network with a feedforward architecture arranged by nodes interconnected by unidirectional connections and supervised learning (Jang, 1993). A neuro-fuzzy network is usually made up of three layers. The first layer (fuzzification) represents the fuzzy rules, that is, the terms that precede the rule. The second layer (intermediate) represents the fuzzy rules, and the third layer (defuzzification) represents the output variables, that is, the consequent term of the rule. However, there can be several types of FIS in an ANFIS network, which can vary depending on the reasoning and rules applied.

FIS (Takagi & Sugeno, 1985), adopted in this study, represents a system that associates a set of linguistic rules in the antecedent (“if” part) with fuzzy propositions and in the consequent (“then” part) presented by expressions of type y = f(x) from the linguistic variables of the antecedent. With this system and from a dataset for training (input and output pairs), it is possible to make predictions of a given variable using an ANFIS architecture (Fig. 2). This architecture is composed of five layers, which each have specific purposes (Jang, 1993).

Fig. 2
figure 2

MODWT and ANN hybrid model with ANFIS network architecture

In the first layer, the degree of membership of the input entries x and y is calculated, according to the type of membership function (MF) chosen in these nodes (A1, A2, B1 and B2). In the second layer, neurons perform the t-norm operation as the algebraic product (neuron ∏) (Eq. 5), considering the MF (\(\mu )\) and the linguistic terms (Ai, Bi).

$${w}_{i}={\mu }_{Ai}\left(x\right){\mu }_{Bi}\left(y\right), i=1, 2\dots$$
(5)

In the third layer, the membership functions are normalized (Eq. 6) through the weights (w) of the N neurons.

$${\overline{w} }_{i}=\frac{{w}_{i}}{{w}_{1}+{w}_{2}}, i=1, 2\dots$$
(6)

In the fourth layer, the outputs of neurons are calculated by the product between the normalized firing levels and the value of the consequent rules. Its parameters correspond to the coefficients of the affine expressions and the neuron activation function, which form the fourth layer (Eq. 7), where \({p}_{i}\), \({q}_{i}\) and \({r}_{i}\) are the parameters associated with the consequents of the rules.

$${z}_{\mathrm{4,1}}={\overline{w} }_{i}\kern 0.1500em{f}_{i}=[{\overline{w} }_{i}\left({p}_{i}+{q}_{i}y+{r}_{i}\right)]$$
(7)

In the fifth layer, the system output is calculated, which together with the nodes of the third and fourth layers promote the defuzzification or sum total of all input signals (Eq. 8).

$$f=\frac{\sum_{i}{w}_{i}\kern 0.1500em{f}_{i}}{{\sum }_{i}{w}_{i}}$$
(8)

For the application of ANFIS in precipitation forecasting, two NMFs (number of membership functions) were adopted as initial parameters for each input variable, and the membership function (MF) type was chosen for the best performance of the network, ranging from triangular, trapezoidal, Gaussian and sinusoidal.

Time-lagged neural network

In problems involving the prediction of time series, neural networks are used as a good artifice, especially in the input layer, where the incorporation of a memory at the input of the network allows the strengthening of the learning of the behaviour of time series, which can be intuitively attached to the other layers of the network, improving the results. Thus, the combination of entries based on antecedent times is suggested in this work. Four combinations were adopted, considering the precipitation of 2, 3, 4 and 5 days before (t-2, t-3, t-4 and t-5) to forecast the current day. In forecasting hydrological variables, the optimal time interval of this delay is not well defined. However, Shoaib et al. (2018), Kim et al. (2020) and Hammad et al. (2021) consider that up to five delays is an acceptable number and this value is adopted in this study. Furthermore, incorporating other climatic variables (air temperature, wind, solar radiation, etc.) in precipitation forecasts can generate errors due to the uncertainty of the real influences that such variables can exert on precipitation.

Seasonality assessment

The daily precipitation data from the rainfall gauge stations were organized in two ways: (1) rainy period, which is formed by 3444 daily precipitations in the months of November–April of 1998–2016, divided into 2584 values for calibration (01/1998–02/2012) and 860 values for validation (02/2012–12/2016); (2) dry period, formed by 3496 precipitations from May–October 1998–2016, divided into 2624 values for calibration (05/1998–06/2012) and 872 and 872 for validation (06/2012–10/2016). This division aims to assess the influence of seasonality on the model’s response. In network processing, data were standardized (Eq. 9) and divided for calibration (75%) and validation (25%).

$${P}_{pad}=\frac{{P}_{i}-{P}_{min}}{{P}_{max}-{P}_{min}}$$
(9)

where \({P}_{pad}\) is the standardized precipitation, \({P}_{i}\) is the precipitation to be standardized and \({P}_{min}\) and \({P}_{max}\) are the smallest and largest values, respectively, observed in the precipitation series. Standardization implies scaling the samples to the dynamic range of activation functions of hidden layers, typically represented by the logistic function or hyperbolic tangent, to avoid saturation of neurons, as adopted by Nourani et al. (2017).

Performance criteria

Model performance was assessed using statistical parameters, which are used to quantify the agreement between observed and estimated data. In this study, we used two classic criteria, the mean square error (MSE, mm) and the Nash–Sutcliffe coefficient (Nash), represented by Eqs. (10) and (11), respectively.

$$MSE=\frac{1}{n}\sum_{i=1}^{n}{(\overline{X }-{Y}_{obs})}^{2}$$
(10)
$$Nash=1-\frac{\sum {\left({Y}_{obs}-{Y}_{est}\right)}^{2}}{\sum {\left({Y}_{obs}-\overline{X }\right)}^{2}}$$
(11)

where n is the number of samples, \({Y}_{obs}\) is the observed precipitation, \({Y}_{est}\) is the estimated precipitation and \(\overline{X }\) is the average of the observed precipitation. The best performing models are those with low MSE and Nash values close to 1 (Chai & Draxler, 2014; Nash & Sutcliffe, 1970).

The methodology adopted in this study consists of the following steps (Fig. 3):

  • The collection, organization and standardization of precipitation data;

  • The decomposition of the historical series by MODWT with wavelet filters;

  • A model calibration performed through MODWT-IA training and adjustments of network parameters, input type and wavelet filters (75% of the historical series); and

  • The validation of the model through the adoption of the optimal parameters obtained in the calibration (25% of the historical series) with performance criteria.

Fig. 3
figure 3

Methodology flowchart

Results and discussion

Using the MODWT, the maximum level of decomposition was found to be eight (Jmáx = 8), and the lengths (L) of the wavelet filters were 4 for db4, 14 for la14 and 6 for c6. Thus, using a maximum level of decomposition equal to 8, a K equal to 4 and Eq. (3), Kj is equal to 766 coefficients affected by the limit of j (this practice was also adopted, for j = 4 and 6, and for L = 6 and 14). Therefore, the first 766 records of input data from the stations are removed after decomposition with wavelet db4-j8. Then, the training of the ANFIS network was carried out at each station through the method of successive approximations in the dry and rainy periods with data from ANA and CMORPH. Tests were also carried out to assess the optimal parameters. After the simulations at each station, with different filters and levels adopted, the best parameters were defined in relation to the lagged inputs regarding the number of membership functions (NMF), type of membership function (MF) and number of epochs. The MFN of 2 for each entry and the generalized bell MF (gbellmf) were the ones with the lowest errors for training, testing, validation and the FIS of the network (0.01570, 0.01656, 0.01601, 0.01542), with entries delayed by 4 days (Table 2). The selected output function was a constant, and the training method was a hybrid.

Table 2 MODWT-ANFIS model calibration parameters

Through simulations with the ANFIS network, it was found that the increase in the number of membership functions and input lag resulted in an increase in computational time and effort without resulting in gain for the network, as the errors (MSE) did not have undergone so much change. Therefore, in this case, increasing the number of inlets and MFN is not advisable for this type of precipitation forecast. This fact may be related to the great effort that the ANFIS network performs with each MF and each input variable, requiring greater computational effort. In this way, the entries with 5 days of delay were made only with 4 NMF to expedite the training and make the training more efficient. Regarding the number of epochs, values from 2 to 100 epochs were adopted. However, the value of 30 epochs presented the lowest MSE because from this value, the errors were without a significant reduction. For the wavelet filter, db4-j8 was the most adjusted for the series with four delays in the ANFIS networks (Table 2). Table 3 presents the optimized parameters of the ANFIS network.

Table 3 ANFIS parameters after training

The Daubechies (db4) wavelet was able to decompose the seasonality element of the time series more efficiently, and its results for levels (j) 6 and 8 and length (L) 4 presented small errors and Nash values close to the ideal. According to Maheswaran and Khosa (2012), the good db4 performance is due to the broader support in seasonal temporal series and the ability to smooth the signal and good location of time and frequency. This process is necessary for precipitation series that present temporal intercurrence. The less asymmetric wavelet (la14) and the coiflet (c6) combined with the ANN also presented good results with small errors and high Nash. However, its performance against db4 was not extensively different. This shows that increasing the length (L) of the filter (6 and 14), for this case, did not bring significant improvements and that the db4 filter with a length (L) of 4 is sufficient for good signal decomposition.

The best filter, according to Zhang et al. (2015), should be the one with the most similar decomposition to the characteristics of the studied series. However, when choosing a filter, other parameters are also associated with the filter. Thus, according to the tests performed, the factors that most influenced the simulations were the level of decomposition and the length of the wavelet. The fit of the best model with level 8 and length 4 has a smoother adjustment and considers the boundary conditions. It provided a moderate and permissible fit for the decomposition of the precipitation data. The longer length (6 and 14) did not show higher quality and could remove a much larger number of wavelet coefficients adjusted by BC, compromising the amount of input data in the model simulation with ANN.

To avoid errors and circumvent BC, it is necessary to choose an adequate wavelet and sufficient input data for training and forecasting (Du et al., 2017; Quilty et al., 2016; Ramírez-Hernández et al., 2016). In the selection of the precipitation series, this question was adopted by testing three wavelet filters and three levels of decomposition, removing the values that interfere in the coefficients affected by the limit of j and by the adjusted division of the number of data used in the calibration and validation. Thus, it was possible to filter the data series, leave them free of uncertainties related to BC and even adjust adequate numbers of input data for the training and validation of neural networks.

In the validation of the MODWT-ANFIS model, tests were performed with 25% of the temporal series, corresponding to the seasonal period (rainy and dry) from 2012 to 2016. In this case, the model presented a Nash value close to 1 and an MSE value less than 0.1 (Fig. 4).

Fig. 4
figure 4

Performance of the MODWT-ANFIS model in the seasons rainy (a) and dry (b)

The effectiveness of the ANFIS model in daily precipitation simulations can be explained by the ability to incorporate fuzzy rules to assist in simulations, being sensitive to learning datasets and able to learn much more during the training period and improve simulations in the testing phase (Seera et al., 2012; Roy & Singh, 2020). Choubin et al. (2016), for example, found that the ANFIS model combined with other techniques can be sufficiently satisfactory in simulating precipitation. The small numbers of modelled data entries with small time delays proved to be effective, as demonstrated by the resulting Nash values close to 1.0. This small number of entries can be considered a great advantage of the model, as it allows overcoming the problem of drier periods (Costa et al., 2015; Suhaila et al., 2011), which require more information from previous days to simulate future days. Furthermore, according to Nerantzaki and Papalexiou (2019), the estimation of precipitation events is still a challenge in the literature and requires specific methods for its modelling. The model was also able to satisfactorily simulate the precipitation of stations E1, E2, E3 and E4 (Fig. 1), with high precipitation located in the Amazon biome, and the precipitation of stations E5 and E6 (Fig. 1), with low precipitation located in the transition region and in the Cerrado biome. In other words, the model had no problems reproducing the precipitation resulting from the Amazon’s climate variability. However, other models have shown problems with this reproduction (Detzel & Mine, 2011; Liu et al., 2011; Ng et al., 2017; Wilks, 1999).

Conclusion

The MODWT-ANFIS model was calibrated, trained and validated, and it satisfactorily simulated the daily precipitation in the Amazon, considering seasonality and the region’s biomes. The small number of data entries input into the model with small time delays proved to be effective and was considered a great advantage of the model. This method can overcome the problems associated with dry periods, which require more information from previous days to simulate future days. The pre-processing of data performed by MODWT was essential to remove noise from the original time series and correct the boundary conditions that could harm the model’s simulations. This stage in the development of the models, together with the time-lagged inputs, configures one of the advantages of hybrid models, such as the analysed model.

The results generated may help future work to better understand the daily precipitation modelling and its behaviour in the Amazon region, which has been suffering from fires and deforestation, impacting the region’s hydrological cycle and affecting various activities, such as human supply, sanitation, agribusiness, water supply, hydroelectric production and waterway transport. This hydrological imbalance affects other regions of the country (midwest, southeast and south), which depend on evapotranspiration (ET) from the Amazon to produce rain, which is also important for the water uses mentioned above. Finally, the global climate is sensitive to changes in the Amazon hydrological cycle.