1 Introduction

Changes in precipitation patterns have an impact on water resources, agricultural production and global biodiversity. Thus, it is necessary to provide its accurate estimation and forecasting.

In recent years, soft computing methods have been used to estimate and forecast precipitation. Abbot and Marohasy (2014) applied artificial neural network (ANN) to forecast Queensland monthly rainfall as a continuous variable. It was found that the ANN forecasts were superior to Australian official forecasts. Wu et al. (2010) compared modular ANN (MANN) with ANN, K-nearest-neighbors (K-NN) and linear regression (LR), which was used in monthly rainfall series forecasting. The attained results indicated higher capability of MANN compared other selected models. Freiwan and Cigizoglu (2005) examined the precision of ANN technique to estimate monthly precipitation. Valverde Ramírez et al. (2005) utilized ANN to forecast daily rainfall. Their results demonstrated that ANN enjoys higher performance compared to the linear regression model. The similar results were provided by Mekanik et al. (2013), who compared the ANN and multiple regression analysis (MR). Moustris et al. (2011) used ANN for forecasting the monthly maximum, minimum, mean and cumulative precipitation in Greece.

Genetic algorithm (GA) was utilized to model hydrological processes with no information on the exact relationship between their components. Nasseri et al. (2008) developed a model by combining the ANN with GA for simulating the rainfall field. The obtained results showed the coupled model provides more accuracy than the single ANN. Sedki et al. (2009) studied the efficiency of the GA for rainfall–runoff forecasting.

Support Vector Machine (SVM) (Vapnik 1995, 1998; Shamshirband et al. 2014) is based upon the structural risk minimization (SRM), which application improves the ability of the SVM for classification, pattern recognition and regression problems (Lu et al. 2011; Yang et al. 2014; Dileep and Sekhar 2014; Guo et al. 2014; Harris 2015). There are two main categories for SVMs namely Support Vector Classification (SVC) and Support Vector Regression (SVR). Recently, the SVM has obtained importance in forecasting and estimation problems of precipitation (Tripathi et al. 2006; Ortiz-García et al. 2014; Sánchez-Monedero et al. 2014). Each SVR model includes three parameters of gamma (γ), cost (C), and epsilon (ε).

Over the past years, SVM with wavelet (WT) has seen to made enormous interest in engineering applications (Peng and Chu 2004; Wang and Li 2011; Kalteh 2013, 2015; Du et al. 2013; Sahay and Srivastava 2014; Shiau and Huang 2014; Liang and Juang 2015; Sun et al. 2015; Yong et al. 2015; Mohammadi et al. 2015). Kisi and Cimen (2012) employed WT-SVM model to forecast daily precipitation. They concluded that the hybrid approach enhance the precision and offer more precision than the single SVM. Nourani et al. (2014) used precipitation satellite data in combination with the feed-forward neural network (FFNN) and the WT for modeling rainfall–runoff process on a daily and multi-step ahead time scales. They concluded that the application of the WT to the runoff data improve the capability of the FFNN rainfall–runoff models for predicting runoff peak values. Venkata Ramana et al. (2013) used Wavelet Neural Network (WNN) model, which is the combination of wavelet analysis and ANN to forecast monthly rainfall at Darjeeling rain gauge station, India. Feng et al. (2015) implemented wavelet analysis-support vector machine coupled (WA-SVM) model for monthly rainfall forecasting in arid regions.

In this study, an estimation model based on three soft computing methods namely WT-SVM, ANN and GP were developed. The precipitation data from Serbia was used as case study.

2 Material and Methods

2.1 Study Area and Data Collection

The study area was Serbia, which is located in the central part of the Balkan Peninsula between 41°53′ and 46°11′ latitude North and 18°49′ and 23°00′ longitude East with an area of 88,407 km2. The climate of the country is temperate continental, with a gradual transition between the four seasons of the year. West Serbia is the wettest part of the country, while the north and south are the driest. The rivers in Serbia belong to the watersheds of the Black, Adriatic and Aegean seas. On more than 90 % of the Serbian territories there are rivers that join the Danube, thus going to the Black Sea.

Riverine and torrential floods are the most significant natural hazards on the territory of Serbia. Torrential flood represents sudden appearance of maximal discharge in river bed with high concentration of hard phase. Climate, specific characteristics of relief, distinctions of soil and vegetation cover, socio-economic conditions have done that the occurrence of torrential flood waves is one of the resulting forms of existing erosion processes. Thus, the accurate estimation of precipitation will help decision makers to plan water resources, agricultural production and land use in the region.

Series of monthly precipitation data were collected from 29 principal meteorological stations for the period of 1946–2012 (Fig. 1). The detail analysis of these data was done in Gocic and Trajkovic (2014a, b) using different multivariate analysis techniques such as R-mode principal component analysis, factor analysis and agglomerative hierarchical cluster analysis, as well as the more common descriptive statistical analysis. The mean annual precipitation varied between 549.7 and 954.9 mm, while the average precipitation for the whole country was 662.4 mm with the standard deviation of 132 mm. The most precipitation falls in June and May, while the least rainy month is February. The main amount of precipitation fell in the regions along the greatest rivers such as the Danube, the Sava and the Velika Morava.

Fig. 1
figure 1

Spatial distribution of the 29 principal meteorological stations in Serbia map

According to the experiments, the input parameters (months and years) and output (precipitation data) are collected from 29 stations and averaged for Serbia and defined for the learning techniques. For the experiments, 60 % of the data was used to train samples and the subsequent 40 % served to test samples.

2.2 Support Vector Machine (SVM)

The SVM (Vapnik 1995, 1998) is based upon the principles of the statistical machine-learning process. One of the advantages of SVM is minimization of structural risks, which minimize the upper-bound generalization error rather than the local training error. SVM estimates the function as:

$$ f(x)=w\varphi (x)+b $$
(1)
$$ {R}_{SVMs}(C)=\frac{1}{2}{\left\Vert w\right\Vert}^2+C\frac{1}{n}{\displaystyle {\sum}_{i=1}^nL\left({x}_i,{d}_i\right)} $$
(2)

where R = {x i , d i } n i is set of data points, x i is the input vector, d i is the target value, n is size of the data φ(x) is the space feature, w is a normal vector, b is a scalar and \( C\frac{1}{n}{\displaystyle {\sum}_{i=1}^nL\left({x}_i,{d}_i\right)} \) is the empirical error. The w and b parameters are estimated by minimizing the function Eq. (2) as:

Minimize

$$ {R}_{SVMs}\left(w,\;{\xi}^{\ast}\right)=\frac{1}{2}{\left\Vert w\right\Vert}^2+C{\displaystyle {\sum}_{i=1}^n\left({\xi}_1+{\xi}_i^{*}\right)} $$
(3)

Subject to \( \left\{\begin{array}{l}{d}_i-w\varphi \left({x}_i\right)+{b}_i\le \varepsilon +{\xi}_i\hfill \\ {}w\varphi \left({x}_i\right)+{b}_i-{d}_i\le \varepsilon +{\xi}_i^{*}\hfill \\ {}\kern1.08em {\xi}_i,\;{\xi}_i^{*}\ge 0,\;i=1,\dots, l\hfill \end{array}\right. \)where ξ i and \( {\xi}_{{}_i}^{*} \) are positive slack variables, \( \frac{1}{2}{\left\Vert w\right\Vert}^2 \) is the regularization, C is penalty factor, ε is the loss function and l is the number of elements in the training dataset.

Equation (1) is estimated by Lagrange multiplier and optimality constraints, thereby obtaining the function given by Eq. (4).

$$ f\left(x,\;{a}_i\;{a}_{{}_i}^{*}\right)={\displaystyle {\sum}_{i=1}^n\left({a}_i-{a}_i^{*}\right)}\;K\left(x,\;{x}_i\right)+b $$
(4)

where K(x, x i ) = φ(x i )φ(x j ) is kernel function.

2.3 Discrete Wavelet Transform Algorithm

The wavelet transform (WT) is a mathematical expression for decomposing a time series data into several groups in order to better analysis of the components (Adamowski and Chan 2011; Jawerth and Sweldens 1994).

The discrete wavelet transform (DWT) is represented as:

$$ {W}_x\left(m,\;n,\;\psi \right)={a}_0^{-m/2}\;{\displaystyle {\int}_{-\infty}^{\infty }f(t)}\;{\psi}^{*}\;\left({a}_0^{-mt}-n{b}_0\right)dt $$
(5)

where the parameters a and b are

$$ \begin{array}{ll}a={a}_0^m,\hfill & b=n{a}_0^m{b}_0\hfill \end{array} $$
(6)

2.4 Artificial Neural Networks

The multi-layer network with a back propagation learning algorithm is one of the most popular neural network architectures. A neural network consists of three layers:

  1. (1)

    an input layer;

  2. (2)

    an output layer; and

  3. (3)

    an intermediate or hidden layer.

The input vectors are D ∈ R n and D = (X 1, X 2, …, X n )T; the outputs of q neurons in the hidden layer are Z = (Z 1, Z 2, …, Z n )T; and the outputs of the output layer are Y ∈ R m, Y = (Y 1, Y 2, …, Y n )T. Assuming that the weight and the threshold between the input layer and the hidden layer are w ij and y j , respectively, and that the weight and the threshold between the hidden layer and output layer are w jk and y k , respectively, the outputs of each neuron in a hidden layer and output layer are:

$$ {Z}_j=f\left({\displaystyle {\sum}_{i=1}^n{w}_{ij}{X}_i-{\theta}_j}\right) $$
(7)
$$ {Y}_k=f\left({\displaystyle {\sum}_{j=1}^q{w}_{kj}{Z}_j-{\theta}_k}\right) $$
(8)

where f() is an exchange capacity, which is the principle for mapping the neuron’s summed information to its yield, and, by a suitable decision, it is a method for presenting a non-linearity into the system’s outline. A standout among the most usually-utilized capacities is the sigmoid capacity, which is monotonically expanding and extends from zero to one.

2.5 Genetic Programming

Genetic programming (GP) is an evolutionary calculation focused around Darwinian hypotheses of common choice and survival to estimate the comparison, in typical structure, that best portrays how the yield identifies with the information variables. The calculation considers a beginning populace of haphazardly-created projects (comparisons), determined from the irregular blending of information variables, arbitrary numbers and capacities, which incorporate, for example, mathematical operators (+, −, ×, and ÷), mathematical capacities (sin, cos, exp, log), and sensible/examination capacities, which must be chosen properly focused on some understanding of the methodology. Then, this populace of potential arrangements is subjected to an evolutionary procedure, and the “wellness” (a measure of how well they take care of the issue) of the advanced projects is assessed. Then, the individual projects that best fit the information are chosen from the starting populace. The projects that provide the best fit are chosen to trade a piece of the data between them to deliver better projects through “hybrid” and ‘transformation’, which copy the common world’s generation process. Trading the parts of the best projects with one another is called hybrid, and arbitrarily changing projects to make new projects is called change. The projects that fit the information less well are disposed of. This development procedure is rehashed over progressive eras and is determined towards discovering typical statements depicting the information, which can be deductively translated to determine learning about the methodology.

2.6 Evaluation Criteria

To assess the proficiency of the WT-SVM, GP and ANN models, the following statistical parameters were used:

  1. 1)

    mean absolute error (MAE)

    $$ MAE=\frac{1}{n}{\displaystyle \sum_{i=1}^n\left|{P}_i-{O}_i\right|}, $$
    (9)
  2. 2)

    mean absolute percentage error (MAPE)

    $$ MAPE=\frac{100\%}{n}{\displaystyle \sum_{i=1}^n\left|\frac{O_i-{P}_i}{O_i}\right|}, $$
    (10)
  3. 3)

    root mean square error (RMSE)

    $$ RMSE=\sqrt{\frac{{\displaystyle \sum_{i=1}^n{\left({P}_i-{O}_i\right)}^2}}{n}}, $$
    (11)
  4. 4)

    Pearson correlation coefficient (r)

    $$ r=\frac{n\left({\displaystyle \sum_{i=1}^n{O}_i\cdot {P}_i}\right)-\left({\displaystyle \sum_{i=1}^n{O}_i}\right)\cdot \left({\displaystyle \sum_{i=1}^n{P}_i}\right)}{\sqrt{\left(n{\displaystyle \sum_{i=1}^n{O}_i^2}-{\left({\displaystyle \sum_{i=1}^n{O}_i}\right)}^2\right)\cdot \left(n{\displaystyle \sum_{i=1}^n{P_i}^2}-{\left({\displaystyle \sum_{i=1}^n{P}_i}\right)}^2\right)}} $$
    (12)
  5. 5)

    coefficient of determination (R2)

    $$ {R}^2=\frac{{\left[{\displaystyle \sum_{i=1}^n\left({O}_i-\overline{O_i}\right)\cdot \left({P}_i-\overline{P_i}\right)}\right]}^2}{{\displaystyle \sum_{i=1}^n\left({O}_i-\overline{O_i}\right)\cdot {\displaystyle \sum_{i=1}^n\left({P}_i-\overline{P_i}\right)}}} $$
    (13)

    where P i [mm] and O i [mm] are the predicted and observed values of precipitation, respectively and n is the total number of test data.

3 Results and Discussion

3.1 Experimental Data

The soft computing models were trained with estimated precipitation data. The testing and training data are presented in Fig. 2. 70 % of data was used for training and 30 % of data was used for testing purpose of the soft computing models.

Fig. 2
figure 2

Total number of a training data and b testing data for Serbia

3.2 Performance Analysis

In this study, to perform the simulation, the Demuth Neural Network toolbox in MATLAB R2010a was utilized to develop the ANN model. Also, the libSVM (Chau and Wu 2010) in MATLAB was used for developing the SVM model. For the SVM model, the three parameters of C, ε and γ were assigned as 2.1, 0.5 and 1.2, respectively.

The five statistical indicators i.e., MAE, MAPE, RMSE, r and R2 were utilized to provide a comparison between the estimated and actual values of ANN and GP with WT-SVM. Table 1 summarizes the comparison results for precipitation estimation in Serbia. It is clearly found that the WT-SVM provides higher accuracy for precipitation estimation.

Table 1 Performance indices of three soft computing approaches for precipitation estimation in Serbia

Figure 3 illustrates the scatter plots of the observed precipitation values against the estimated values by the WT-SVM, GP and ANN techniques. The results show that there favorable agreements between the estimated values by WT-SVM and the observed values so that there are no points of significant overestimation or underestimation.

Fig. 3
figure 3

Scatter plot of the a WT-SVM, b GP and c ANN precipitation estimation

Figure 4 illustrates time series of precipitation estimates, while the Fig. 5 represents the residuals of the ANN, Gp and WT-SVM in test period. It can be noticed that the WT-SVM model offers higher accuracy compared to ANN and GP models. In addition, the WT-SVM predicts the maximum peak better than ANN and GP. The similar results have been represented in Kisi and Cimen (2012), Venkata Ramana et al. (2013) and Feng et al. (2015).

Fig. 4
figure 4

Time series of monthly precipitation estimates of ANN, GP and WT-SVM in test period

Fig. 5
figure 5

Residuals of monthly precipitation estimates of ANN, GP and WT-SVM in test period

4 Conclusion

In this paper, three soft computing methods of WT-SVM, GP and ANN were used to estimate precipitation. The precipitation data were collected from 29 principal meteorological stations in Serbia for the period 1946–2012. The WT-SVM model was developed by combining the SVM with WT. The proposed method is unique; thus, it can boost the precision compared with the other earlier developed models. The performance of the simulation results were evaluated using five statistical parameters (MAE, MAPE, RMSE, r, R2).

The results indicated that the WT-SVM model has a better accuracy in estimation of the precipitation compared to ANN and GP models. Therefore, the WT-SVM model can be assumed as an efficient and useful soft computing model to predict precipitation. The algorithm can be utilized also for other regions in future studies in order to prove the efficiency of the model for all weather conditions.