1 Introduction

Growing population and industrial development have made remarkable negative effects on natural resources (Bawa and Seidler 2015; Heald and Spracklen 2015). In addition, the occurrence of natural disasters such as droughts, floods, hurricanes and tsunamis have increased recently (Dhanalakshmi et al. 2015). Thus, precise time series analysis and modeling of rainfall is highly essential for modeling droughts and floods event (French et al. 1992; Nirmala and Sundaram 2010; Srivastava et al. 2010; Danandeh Mehr et al. 2018). Therefore, long-term forecasting of rainfall is highly essential for proper managing of water resources. For instance, the amount of rainfall specifies the groundwater status, which in turn can supply the water at various areas (van Eekelen et al. 2015). In addition, rainfall has significant effects on the ecological phenomena such as agriculture practices (de Abreu-Harbich et al. 2015).

For understanding the basic aspects of this stochastic process, some physically-based and probabilistic models have been developed. Formerly, stochastic models such as Auto Regressive Integrated Moving Average (ARIMA) with high computational costs (Kaushik and Singh 2008) have got a lot of attention in water related studies. Momani (2009) utilized ARIMA for rainfall forecasting, however, concluded that model cannot exactly predict the peak amount. Besides, these models are linear and not able to catch the irregularities of rainfall (Cramer et al. 2018).

More recently, machine learning (ML) algorithms were examined for environmental processes (Babovic 2005) such as Velocity predictions in compound channels with vegetated floodplains (Harris et al. 2003), Chezy resistance coefficient in corrugated channels (Giustolisi 2004), rainfall (Nastos et al. 2014), soil temperature (Samadianfard et al. 2018), Evaporative Loss (Deo and Samui 2017), Manning’s n in meandering flows (Pradhan and Khatua 2017), pan evaporation (Qasem et al. 2019), Dew point temperature (Naganna et al. 2019), global solar radiation (Samadianfard et al. 2019). In this regard, rainfall forecasting using ML techniques may be beneficiary. Commonly implemented ML methods for rainfall forecasting are artificial neural networks (ANN), fuzzy logic (FL), genetic programming (GP) and support vector regression (SVR) (e.g. (Pongracz et al. 2001; Moustris et al. 2011; Yaseen et al. 2017; Danandeh Mehr et al. 2018). As an example, Venkata Ramana et al. (2013) utilized the wavelet ANN to forecast the rainfall time series of Darjeeling in India. They stated that the wavelet ANN models were better than the single ANN. In another studies, the performances of the ANFIS and the SVR have been compared for rainfall forecasting (Shamshirband et al. 2014). They concluded that the ANFIS model was more accurate than the SVR.

However, latest researches have shown that stand-alone ML models are not so accurate for rainfall forecasting in arid areas especially at the long time periods. So, hybrid methods such as wavelet-ANN and wavelet-SVR (Kisi and Cimen 2012) were suggested. For example, genetic algorithm (GA) was implemented for optimizing the structures of ANN models for rainfall forecasting (Saxena et al. 2014). Nourani et al. (2009) demonstrated that hybrid wavelet-ANN conjunction model can be utilized accurately in Liqvan basin for forecasting rainfall 1 month ahead. Solgi et al. (2014) demonstrated that wavelet-ANN model performed better than ANFIS for rainfall forecasting. In another study, Yaseen et al. (2018) implemented firefly optimization algorithm for improving the potential of ANFIS models for rainfall forecasting. Obtained results proved the advantage of the hybrid model to the ANFIS results.

The most published works has greatly concentrated on rainfall forecasting at short time scales. However, they are only few studies related to the efficiency of hybrid ML methods rainfall forecasting in long time periods (e.g., Farajzadeh and Alizadeh 2017) Thus, further examination is crucial for modeling the long term non-linear behavior of the rainfall events. As stated before, precise forecasting of long-term rainfall is typically challenging due to the basic non-linear interrelations among rainfall and its previous amounts.

The main goal of the current research is to investigate the feasibility of the hybrid artificial intelligence model for modeling rainfall pattern with annual scale at Senegal region. The proposed model is consisted an integration of Whale optimization algorithm (WOA) with multilayer perception model (MLP). This is for the first time implementation of MLP-WOA for annual rainfall scale and at this particular region. So, the novelty of the current research is testing the possibility of utilizing WOA for improving the accuracy of MLP method and accordingly for obtaining more accurate predictions of annual rainfall and obtaining more profound knowledge about annual rainfall pattern in the studied region. So, the developed model proposed a reliable alternative approach modeling climatological process based on the potential of artificial intelligence models hybridization with nature inspired algorithm.

2 Material and Methods

2.1 Study Area and Data

The annual rainfall data of Fatick (latitude14,33 N, longitude 16,40E-) and Goudiry Station (latitude 14,18 N, longitude −12,72 E) in Senegal that are in approximately different climate areas, were utilized in the current study (Fig. 1). annual rainfall in the time period of 1933 to 2013 were used in this study. Table 1 presents the statistical parameters of the annual rainfall in both stations. The data series were split in two parts: data from 1933 to 1993 (75% of whole series) were utilized for training and the residual data from 1993 to 2013 (25% of whole series) were used for testing the studied models (Fig. 2). The best data division was selected based on trial and error procedure where 75%–25% attained the best learning process for the developed predictive model.

Fig. 1
figure 1

Location of the study area (SENEGAL) and the selected stations

Table 1 Annual statistical parameters of rainfall (mm) for studied stations
Fig. 2
figure 2

The raw rainfall (mm) time series of Fatick, Goudiry

Table 1 indicate normal distributions of the annual rainfall of both stations due to their low skewness values. Moreover, the Pearson correlation coefficients, Histogram and Q-Q plot were used to check the homogeneity of data as illustrated in Fig. 3. Data in both stations showed normal distribution and indicated high correlation with 3 lags.

Fig. 3
figure 3

The Pearson correlation coefficients, Histogram and Q-Q plot for the 2 stations in this study

2.2 Multilayer Perceptron (MLP) Neural Network

Artificial neural networks are based on the inference from the natural nervous structure. This method of neuronal and intelligent structure, with the proper modeling of neurons in the human brain, tries to simulate the intracellular behavior of brain neurons through mathematically defined functions, and through the computational weights available in the synthetic neuron communication lines, the synaptic function is modeled in natural forms. The empirical and flexible nature of this method makes it possible to address complex issues such as the predictive category with nonlinear behavior. For the purpose of the pattern, it is trained with a bunch of data to input the input the new ones, considering the relationship found in the training stage, will calculate the appropriate output. Among the numerous samples of the neural networks, the back-propagation network has more application (Mohanty et al. 2013). This network is composed of the layers; these layers have elements with neuron names. Each layer is completely linked to the layer before and after itself.

Figure 4 shows a three-layered structure, which was utilized in the current research. It involves of (i) input layer, (ii) hidden layer, and (iii) output layer. The independent parameters in input layer comprise Pt-1, Pt-2 and Pt-3. The dependent variable used as output is Pt. The optimum network architecture was defined as 3–8-1 that includes 3 neurons for input, 1 hidden layer with 8 neurons and 1 output neuron. In addition, the sigmoid tangent function for the input layer and the linear function for the output layer were selected using the Lewenberg Marquard Algorithm (LMA) with repeating 200. It should be noted that in machine learning models, every more configuration in internal neurons based on the learning process increase complexity. Thus, for this case, there were three input variables used to construct the prediction of one step ahead and this causes the necessity of 9 neurons to build the network.

Fig. 4
figure 4

Arrangement of the used artificial neural network in this study

2.3 The structure of the Multilayer integrated with Whale Optimization Algorithm (MLP-WOA)

Whale optimization algorithm is an innovative heuristic algorithm that belongs to the family of stochastic population-based algorithms suggested by (Mirjalili and Lewis 2016); it impersonators the foraging of humpback whales. The humpback whales hunt a school of krill or small fishes close to the surface and they have special hunting method known as bubble-net feeding method. They swim around preys within a shrinking circle and create distinctive bubbles along a spiral-shaped path (Fig. 5). The WOA mimics in two stages; the first stage is exploitation that involves encircling a prey and spiral bubble-net attacking method and the second stage is exploration which includes search randomly for a prey.

Fig. 5
figure 5

The mechanism of the Whale Optimization Algorithm (WOA)

The WOA algorithm can detect the position of the hunt in order to encircle them. Since the optimum search location in search space is not predefined, the whale procedure assumes that the current best location is target prey or close to optimum. The location of a search agent is updated according to a randomly chosen search agent instead of best search agent obtained. This performance is characterized by the following equations:

$$ \overrightarrow{D}=\left|\overrightarrow{C}.\overrightarrow{X^{\ast }}-\overrightarrow{X}(t)\right| $$
(4)
$$ \overrightarrow{X}\left(t+1\right)=\overrightarrow{X^{\ast }}(t)-\overrightarrow{A}.\overrightarrow{D} $$
(5)

where t represents the current iteration, \( \overrightarrow{C} \) and \( \overrightarrow{A} \) are coefficient vectors, X is the location vector of the best solution obtained so far, \( \overrightarrow{X} \) is the location vector. A and C are calculated as follows:

$$ \overrightarrow{A}=2\overrightarrow{a}.\overrightarrow{r}-\overrightarrow{a} $$
(6)
$$ \overrightarrow{C}=2.\overrightarrow{r} $$
(7)

where a decreases linearly from 2 to 0 over the sequence of iterations and r is a random vector produced with uniform distribution in the interval of [0, 1]. According to Eq. (5) the solutions apprise their locations based on the location of the best solution that is known (prey). The alteration of the values of A and C vectors check the areas where a solution (whale) can be positioned in the region of the best solution (prey). Humpback whale generates a trap with moving in a spiral trail around preys and in WOA for achieving the Shrinking encircling behavior, a in Eq. (6) is decreased based on the following equations:

$$ a=2-t\frac{2}{MaxIter} $$
(8)

where t is the repetition number and MaxIter is the maximum number of permissible iterations. In order to simulate the spiral-shaped path the distance between a search agent (solution) (X) and the best known search so far (leading solution) (X) is calculated. After that for creating the position of neighbor search agent, a spiral equation is created as follows:

$$ \overrightarrow{X}\left(t+1\right)={D}^{\prime }.{e}^{bL}.\cos \left(2\pi L\right)+\overrightarrow{{\mathrm{X}}^{\ast }}(t) $$
(9)

where D is the distance of the ith whale and the prey which is calculated as in \( {D}^{\prime }=\left|\overrightarrow{X^{\ast }}(t)-\overrightarrow{X}(t)\right| \), b is a constant for defining the shape of the logarithmic spiral, and L is a random number in [−1,1]. As mentioned above the humpback whales swimming around preys within a shrinking circular as well as a spiral-shaped path at the same time. To simulate the two mechanisms it is assumed that there is a probability of 50% to choose between them during the optimization process as follows:

$$ \overrightarrow{X}\left(t+1\right)=\left\{\begin{array}{c}\mathrm{Shrinking}\ \mathrm{Encircling}\ \left(\mathrm{eq}.5\right)\kern1em \mathrm{if}\left(\mathrm{P}<0.5\right)\\ {}\mathrm{spiral}-\mathrm{shaped}\ \mathrm{path}\ \left(\mathrm{eq}.9\right)\kern1.5em \mathrm{if}\left(\mathrm{P}\ge 0.5\right)\kern0.5em \end{array}\right. $$
(10)

where P is a random number in [0, 1]. In this study, the values of P and L were 0.65 and 0.37, respectively and also population size and maximum iteration were 30 and 50, respectively. In hidden layer, the optimum number of neurons was 8 (Table 2).

Table 2 Values of the variables used in the WOA method

In this research, two models of MLP and MLP-WOA are used to estimate Pt using 3-Input variables of Pt-1, Pt-2 and Pt-3. In both models (MLP and MLP-WOA), the Pt-1, Pt-2 and Pt-3 values are as the input variables and the Pt values are used as output variable (Fig. 6). Each set of data consists of 80 dataset. In both models, 75% of the dataset (60 data) is used for the training and 25% of the dataset (20 data) is used for the testing phase.

Fig. 6
figure 6

The proposed hybrid predictive model for the annual rainfall forecasting

2.4 Accuracy Assessment Criteria

In order to measure the accuracy of the models, various statistics have been used. In this study, statistical parameters of correlation coefficient (R), root mean square errors (RMSE) and Kling-Gupta efficiency (KGE) are used as following (Chadalawada et al. 2017; Diop et al. 2018; Yaseen et al. 2018).

$$ R=\frac{\sum_i^n\left({O}_i-\overline{O}\right)\left(P-\overline{P}\right)}{\sqrt{\sum_{i=1}^n{\left({O}_i-\overline{O}\right)}^2{\sum}_{i=1}^n{\left(P-\overline{P}\right)}^2}} $$
(11)
$$ RMSE=\sqrt{\frac{\sum_{i=1}^n{\left({P}_i-{O}_i\right)}^2}{n}} $$
(12)
$$ KGE=1-\sqrt{{\left(R-1\right)}^2+{\left(\beta -1\right)}^2+{\left(\gamma -1\right)}^2} $$
(13)
$$ R=\frac{\left[{\sum}_{i=1}^n\left({O}_i-\overline{O}\right)\left({P}_i-\overline{P}\right)\right]}{\sqrt{\sum_{i=1}^n{\left({O}_i-\overline{O}\right)}^2{\sum}_{i=1}^n{\left({P}_i-\overline{P}\right)}^2}}\kern1.25em \beta =\frac{\overline{P}}{\overline{O}}\kern0.5em \gamma =\frac{C{V}_P}{C{V}_O}=\frac{\frac{\sigma_P}{\overline{P}}}{\frac{\sigma_P}{\overline{O}}} $$

where Oi is the observed value​, Pi is the estimated value obtained from intelligence model, \( \overline{\mathrm{O}} \) is the average of observed values, \( \overline{\mathrm{P}} \) is the average of estimated values from MLP or MLP-WOA and n is the number of data set.

3 Application Results and Discussion

The main motivation of the current research is to introduce the feasibility of the hybridized artificial intelligence for rainfall pattern forecasting. The annual scale data of two meteorological stations namely Fatick and Goudiry, was used to build the proposed hybrid intelligence MLP-WOA and the comparable version which is standalone MLP model. Prior to the forecasting process, the related lag times to construct the predictors of the forecasting matrix are determined using the correlation analysis. Three input combinations incorporated three different lag times are summarized in Table 3.

Table 3 Modelling input combination structure

Several statistical performance metrics were used to validate the capacity of the proposed model including the best-fit-goodness and the absolute error. The obtained results of statistical analysis for annual rainfall forecasting at Fatick and Goudiry stations using MLP-WOA and MLP models tabulated in Table 4. For Fatick station, MLP3 neural network structure of 3–8-1 attained RMSE = 168.9 mm, R = 0.63 and KGE = 0.539 over the training phase and RMSE = 159.2 mm, R = 0.67 and KGE = 0.174 over the testing phase presented more precise results of annual rainfall forecasting among MLP models. On the other hand, MLP-WOA3 hybrid model incorporating three lag times and using the same structure of 3–8-1 attained RMSE = 162.5 mm, R = 0.70 and KGE = 0.562 for the training period and with RMSE = 130.0 mm, R = 0.69 and KGE = 0.401 for the testing period, showed more superior performance than standalone MLP model.

Table 4 The results of statistical analysis for MLP-WOA and MLP models

MLP-WOA3 decreased the magnitude of the RMSE over the standalone MLP3 by 18.3% and increased the value of the correlation coefficient R by 3.0% over the testing period. Approximately, the same trend was observed for Goudiry station, where the MLP3 with the same neural network structure of 3–5-1 achieved minimum RMSE = 132.4 mm, R = 0.69 and KGE = 0.494 over the training period; whereas, RMSE = 101.9 mm, R = 0.59 and KGE = 0.138 for the testing period. Similarly, MLP-WOA3 with the same input lags and neural network structure of 3–5-1 accomplished RMSE = 95.3 mm, R = 0.74 and KGE = 0.571 over the training period and minimum RMSE = 105.9 mm, R = 0.65 and KGE = 0.264 over the testing period. In which produced the highest forecasting accuracy of the annual rainfall forecasting using the proposed MLP-WOA model. In quantitative enhancement percentage, MLP-WOA3 increased the RMSE value over MLP3 by 3.9% while 10.2% and 130 % increment for R and KGE metrics, respectively over the testing period. Thus, it can be stated that the MLP-WOA3 could not able to reduce the RMSE value of correspondent MLP model in Goudiry station and therefore it is not recommended it this station. This is might be due to the high stochasticity of the rainfall data that influenced by several other climatological information such air temperature, humidity, evaporation and wind speed.

For more representative evaluation of the applied forecasting models, Fig. 7 revealed the observed and predicted values of the annual scale rainfall over the testing modeling period for both studied stations. In addition, the figure illustrated the scatter plot between the observed (x-axis) and predicted (y-axis) values of annual rainfall where the variation between the forecasted the observed data was indicated in the form of linear regression formula. It is clear that the forecasting potential, MLP-WOA3 at Fatick station gave better agreement with observed annual rainfall over the comparable MLP3. Furthermore, the forecasts of MLP-WOA3 were closer to the exact line in comparison with the points of MLP3. Further, it was realized that for Goudiry station, the performances of MLP3 and MLP-WOA3 as the best models are approximately similar. However, the error of MLP3 was lower than MLP-WOA3 and it can be recommended for annual scale rainfall forecasting at this station. For both stations, the lead times of three antecedent values of annual rainfall was provided the informative historical data memory to build the forecasting model. Whereas, less information could be allocated from the first two lead times. For the Goudiry station, the developed hybrid intelligence MLP-WOA model could not attained high prediction performance over the testing modeling period as much as the training phase. This might be due to the lack of some essential time series data over the testing phase that was not experienced over the training phase. In addition, the coordinate of Goudiry station that could be substantially influenced by the neighbor synoptic climate features. Thus, more related climate information could be incorporated for building the forecasting model at Goudiry station.

Fig. 7
figure 7

Modeled rainfall of the testing phase by MLP and MLP-WOA, and scatter diagrams at 2 stations Fatick, Goudiry

4 Conclusion

Forecasting of annual rainfall is compulsory for proper managing of water resources and strong investigations of the hydrological effects of floods and droughts usually require the forecasting of rainfall in the scale of long terms. In the current research, some attempts have been made to predict annual rainfall using an innovative hybrid model, namely MLP-WOA, which is a MLP model optimized by whale optimization algorithm. For that purpose, the annual rainfall data between 1933 and 2013 from two stations of Senegal including Fatick and Goudiry stations were used in this study. Also, the precision of MLP and suggested MLP-WOA models in predicting annual rainfall using historical rainfall data were examined by implementing statistical parameters of RMSE, R, and KGE. Results indicated that the accuracy of MLP-WOA was slightly better than standalone MLP and can be recommended for predicting annual rainfall in the study area.