1 Introduction

In nature, several hydrological processes occur to sustain the water cycle. Some of them include rainfall-runoff process, groundwater process, river water systems etc. These processes need to be modelled by making use of events that occurred in the past to predict. Adequate supply of pure water is essential for maintaining health and sanitary conditions (Bhardwaj and Parmar 2013a). River water is the most important resource for drinking water, ground water and it also affects the climate of that area. Yamuna river is the largest tributary of Ganga in the northern republic of India. It originates from the Yamunotri glacier (380 59’ N 780 27’ E) on the south western slopes of Banderpooch peaks within the lower range of mountains in Uttarakhand. Map of the sample location is shown in Map 1. It crosses many states, Uttarakhand, Haryana, Himachal Pradesh, Delhi, Uttar Pradesh by travel a complete length of 1,376 kilometers. It has a mixing of drainage system of 366,233 km2 before merging with Ganga at Allahabad, a total of 40.2 % of the entire Ganga basin (CPCB and Water Quality Status of Yamuna River 2006). Nizamuddin (sample site) is approximately 14 km downstream from the Wazirabad barrage at Delhi (capital of India) and 410 km from Yamunotri. Pollution in river water is continuously increasing due to urbanization, industrialization, population growth etc. The water quality at Nizamuddin (Delhi) has the impact of industrialization, sewerage, domestic discharge from Haryana and Delhi. Many rivers are dying due to pollution, which is an alarming signal (Parmar et al. 2009).

Map 1
figure 1

Sample site description

There are many different methods of mathematical modelling, which have been developed to forecast long-term precipitation (Rangarajan and Ding 2000; Kahya and Kalayci 2004; Shukla et al. 2008; Doyle and Barros 2011; Dökmen and Aslan 2013). In these methods, a statistical approach, such as time series modelling, is a conventional method that has been widely used (Zhang et al. 2011; Sachindra et al. 2012; Underwood 2012; Diodato et al. 2014; Parmar and Bhardwaj 2014). Statistical modelling has many advantages over mathematical models. But the shortcomings of the statistical approach include handling nonlinear characteristics of data because the statistical models are usually based on the linear correlations of the data can be expressed with a correlation coefficient. To overcome the shortcomings of the statistical methods, many other models that address the nonlinearity of data were developed, including an artificial neural network (ANN), fuzzy, wavelet are capable methods (Nayak et al. 2004; Partal and Kisi 2007; Yeniguna and Ecer 2012). ANN is a pattern matching technique that uses a pair of input and output sets and has a learning algorithm for improving the weights between the nodes (Hsu et al. 1995; Jeong et al. 2012). FIS is another data-driven model that is similar to ANN and would be a proper method for representing linguistic fuzzy if-then rules that are difficult to formulate through a model with crisp parameters (Yeon et al. 2009). A fuzzy logic model was developed to estimate flow for poorly gauged mountainous basins, using Mamdani approach. The entire data were distributed into two parts, namely training and testing. The results of the model were compared with calculated data. This fuzzy model provides more precise and consistent results (Toprak et al. 2004; Aksoy et al. 2004; Hung et al. 2009; Toprak et al. 2009; Chen and Chang 2010). Wavelet analysis in the last decade has become an important mathematical tool for research. Wavelet analysis has both good time and frequency multi resolution, which can use effectively to diagnose signal. Wavelet can typically be visualized as a “brief oscillation” like the signal that might be seen from a seismograph or a heart monitor (Grapes 1995; Bhardwaj et al. 2011; Bhardwaj and Parmar 2013b). Wavelet analysis is a better tool than the Fourier transform. Wavelets used for time-scale demonstration of the time series and their interaction to resolve time series with non-stationarities (Adamowski and Chan 2011).

In this study, Neuro, fuzzy and wavelet coupled model is introduced to predict the monthly COD level of Yamuna river considering past history data of 10 years. The study has an importance as the quality of water impacts the 70 % supply of Delhi water from Yamuna river and more than 57 million people depend on river water for their daily usage.

2 Methodology

2.1 Sample Site

The monthly mean value of water quality parameter COD (Chemical Oxygen Demand) monitored at Nizamuddin of river Yamuna in Delhi (India) of the last 10 years have been considered sample site in the present study. Out of 120 samples from 1999 to 2009, 108 have been used to develop the prediction model for COD level time series and 9 data have been used for the test model. First three data points are used as the input values. Sample site is selected according as the utilization of river water and population ratio. It shows in Map 1 below.

2.2 Regression Analysis

It is a technique used for modeling and analyzing several variables. Regression analysis helps in understanding the variation in value of the dependent variable as independent variables is varied, while the other independent variables are held fixed. Regression line of Y (dependent variable) on X (independent variable) (Parmar and Bhardwaj 2014) defined as

$$ Y={b}_{yx}X+C $$

where C is a constant of integration.

  • b yx = regression coefficient \( =r\times \frac{\sigma_y}{\sigma_x} \).

  • r = Correlation coefficient \( =\frac{E(XY)-E(X)E(Y)}{\sqrt{\left(E\left({X}^2\right)-E{(X)}^2\right)\left(E\left({Y}^2\right)-E{(Y)}^2\right)}}=\frac{\operatorname{cov}\left(X,Y\right)}{\sigma_X{\sigma}_Y} \)

  • σ Y , σ X are Standard Deviation of variables Y and X respectively.

2.3 Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) is commanding non-linear modelling approach. Which is based upon human brain functioning. It identifies and learns the correlated patterns between input values and objective values. It networks with nodes or neurons, which are interconnected to each other. A general three-layered neural network, consists of several elements nodes. These systems hold an input layer comprising of hubs speaking to diverse input variables, the hidden layer comprising of numerous concealed hubs and a output layer comprising of output variables. ANN as a forecasting method, is broadly connected in hydrology and water asset studies. Feed-forward neural networks are successfully applied in many different studies (Chang and Chang 2006; Chaturvedi, et al. 2004). Partially recurrent systems begin with a completely repetitive net and include a feed-forward association that detours the repeat, adequately treating the recurrent part as a state memory. Recurrent networks are mostly used in nonlinear time series prediction, pattern analysis and recognition system.

A three layered FFNN based on backpropogation has been selected. There are three input nodes (depending upon input variable), three hidden nodes with tan sigmoid transfer function and one output node with linear transfer function. So for predicting the output, past three inputs have to be estimated. The momentum and learning rate has kept to 0.9 and 0.1 and it trained for 2,000 epochs.

2.4 Wavelet Transforms

Wavelet could be depicted as a pulse of short time with finite energy that integrates to zero. Wavelets are located both in time and space. Wavelet method crosses the boundaries of the Fourier technique by utilizing capacities that hold a convenient tradeoff between time location and frequency detail. Wavelet transform can be used to locate the information when and where a particular event occurs (Daubechies 1992). Thus, it is very useful in matching the trends in a given time series of data. Wavelet analysis employs a prototype function called mother wavelet g (t). This mother wavelet function has zero average and piercingly drops in an oscillatory way (Can et al. 2005; Labat 2008). Data is described in two versions, superposition of scaled and translated of the pre specified mother wavelet. Continuous wavelet transform (CWT) of a given signal x(t) with respect to g(t) is defined in (1) where a, b are scale and translation factor respectively (Daubechies 1992).

$$ CWT\left(a,b\right)=\frac{1}{\sqrt{a}}{\displaystyle \underset{-\infty }{\overset{+\infty }{\int }}x(t)g\left(\frac{t-b}{a}\right)}dt $$
(1)

A continuous wavelet transforms (a, b) coefficient, depicts how accurately signal x(t) and the mother wavelet matches at a particular scale and translation. Thus, the set of all wavelet coefficient CWT (a, b), associated with a particular signal x (t), is the wavelet representation of the signal with respect to the mother wavelet g(t).

The first step of wavelet transforms corresponds to the mapping signal f to its wavelet coefficients. Using this process two components are received i.e. a smooth version called approximation and a second component that corresponds to the deviations or details of the signal. A decomposition of the signal f into a low frequency part a 1 and a high frequency part d 1 at level 1 is represented by f = a 1 + d 1. Similarly the same procedure is performed on a 1 in order to obtain a decomposition in finer scales: a 1 = a 2 + d 2. Recursive decomposition for the low frequency parts follows the directions. The resulting low frequency parts a 1 , a 2 …,a n are approximations of f and the high frequency parts d 1 , d 2 .....,d n contain the details of f. It also classifies as follows (Bhardwaj and Parmar 2013b):

$$ \begin{array}{l}\kern6.24em f\\ {}\kern5.76em \swarrow \searrow \\ {}\kern5.52em {a}_1\kern1em {d}_1\to \mathrm{Level}\;1\\ {}\kern5.16em \swarrow \searrow \\ {}\kern4.8em {a}_2\kern1em {d}_2\to \mathrm{Level}\;2\\ {}\kern4.32em \swarrow \searrow \\ {}\kern3.96em {a}_3\kern1em {d}_3\to \mathrm{Level}\;3\\ {}\kern3.48em \swarrow \searrow \\ {}\kern3.24em {a}_4\kern1em {d}_4\to \mathrm{Level}\;4\\ {}\kern2.76em \swarrow \searrow \\ {}\kern2.4em {a}_5\kern1em {d}_5\to \mathrm{Level}\;5\end{array} $$

Let f = (f 1, f 2, f 3, ⋯, f N ), N is an even integer. \( {a}_m=\frac{f_{2m-1}+{f}_{2m}}{\sqrt{2}} \), m =1, 2, 3, …, N/2

a 1 = (a 1, a 2, …, a N/2). The first trend sub-signal (approximation)

\( {d}_m=\frac{f_{2m-1}-{f}_{2m}}{\sqrt{2}} \), m =1, 2, 3, …, N/2

1-level Transform

$$ f\overset{H_1}{\to}\left({a}^1/{d}^1\right) $$

and so on.

2.5 Adaptive Neuro-Fuzzy Inference System

The adaptive neuro-fuzzy interface system (ANFIS) methods is used worldwide, as an estimator. It is able to approximate real continuous function on a compact set to a degree of accuracy (Jang 1993). The General outlook of fuzzy inference system is seen as a model, which maps input characteristics to input membership functions. After that, these input membership functions relates with rules and then these rules relates to set of output characteristics. At last, it maps output characteristics to output membership functions, and the output membership function to a single output or a decision associated with the output Jang (1993)). There are three main parts of fuzzy systems, fuzzifier, fuzzy database and defuzzifier. Further fuzzy database consists of two main sections, fuzzy rule base, and inference mechanism (Karmakar and Mujumdar 2006; Seyed et al. 2013; Sahay and Srivastava 2014).

2.5.1 ANFIS Architecture

According to the Takagi and Sugeno type, the FIS has two inputs x & y and one output f as shown in (Fig. 3) (Chaturvedi, et al. 2004).

  1. Rule 1

    If x is A 1 & y is B 1 then

    $$ {f}_1={p}_1x+{q}_1y+{r}_1 $$
    (2)
  2. Rule 2

    If x is A 2 & y is B 2 then

    $$ {f}_2={p}_2x+{q}_2y+{r}_2 $$
    (3)
  1. Layer 1

    Every node i in this layer is a square node with a node function

    $$ {O}_i^1=\mu {A}_i(x) $$
    (4)

    Where x is the input to node i, and A i is the linguistic label related to this node function. So, O 1 i is the membership function of A i , which expresses the degree to which the given x satisfies the A i . Mostly, it is to supposed μA i (x) to be bell-shaped and maximum value to 1, minimum value to 0, such as the generalization bell function

    $$ \mu {A}_i(x)=\frac{1}{1+{\left[{\left(\frac{x-{c}_i}{a_i}\right)}^2\right]}^{b_i}} $$
    (5)

    Or Gaussian function

    $$ \mu {A}_i(x)={e}^{\left[-{\left(\frac{x-{c}_i}{a_i}\right)}^2\right]} $$
    (6)

    where {a i , b i , c i } is the parameter set. The bell shaped function behaving according as the values of parameters set change. It exhibiting various forms of membership functions on linguistic label A i . Parameters in this layer are referred as premise parameter.

  2. Layer 2

    Every node in this layer is a circle node. It multiplies the input signals and sends the product out. For instance

    $$ \begin{array}{llll}{O}_i^2=\hfill & {w}_i=\hfill & \mu {A}_i(x)\times \mu {B}_i(x),\hfill & i=1,2,3....\hfill \end{array} $$
    (7)

    Each node output represents the firing strength of a rule.

  3. Layer 3

    Every node in this layer is a circle node labeled N. The i th node measures the ratio of the i th rule’s firing strength to sum of all rule’s firing strengths

    $$ \begin{array}{llll}{O}_i^3=\hfill & {w}_i=\hfill & \frac{w_i}{w_1+{w}_2},\hfill & i=1,2,3 \dots .\hfill \end{array} $$
    (8)

    For convenience, outputs of this layer will be called normalized firing strength.

  4. Layer 4

    Every node i in this layer is a square node with a node function

    $$ {O}_i^4={\overline{w}}_i{f}_i={\overline{w}}_i={\overline{w}}_i\left({p}_ix+{q}_iy+{r}_i\right) $$
    (9)

    where \( {\overline{w}}_i \) is the output of the previous layer (i.e. layer 3) and (p i , q i , r i ) is the parameter set.

  5. Layer 5

    The single node in this layer is a circle node labeled, which computes the whole output as the sum up of all incoming signals i.e.

    $$ {O}_i^5={\displaystyle \sum {\overline{w}}_i{f}_i}=\frac{{\displaystyle \sum_i{w}_i{f}_i}}{{\displaystyle \sum_i{w}_i}} $$
    (10)

    Hence, we have developed an adaptive network that is working equivalent to a type 3 FIS.

3 Results and Discussion

Time series graph of past data of river Yamuna in Delhi is discussed in Fig. 1. This shows the past pattern of the behavior of river quality water.

Fig. 1
figure 2

Time series of monthly level of COD of River Yamuna at Nizamuddin (Delhi)

3.1 Statistical Analysis

3.1.1 Testing of Stationarity of COD Time Series

Many hydrological time series processes may be stationary or non stationary. The presence of non stationarity in a hydrological time series can result from steady natural and man-made changes in the hydrological environment. The suspicion of stationarity is especially essential in studies, which keep focus on forecast of rare event which happen convulsively in the given time series.

Stationarity is the first primary statistic property tested in the time series study. Autocorrelation function (ACF) serves as a root indicator whether non stationarity or stationarity is present in the time series. For stationarity analysis of COD level of river Yamuna at Nizamuddin station autocorrelation function (ACF), partial autocorrelation function (PACF) test have been applied. Wiee (1990) states that if the ACF decays very slowly, then the time series is non stationary and if a fast decay occurs it implies that the samples are stationary. The autocorrelation function graph which is known as correlogram is taken into consideration to visually detect the existence of stationarity. Figure 2 depicts the fast decay of ACF for 24 lags, which reveals the stationarity of monthly COD level.

Fig. 2
figure 3

Correlogram representations of COD level time series for 24 lags

According to the results, null hypotheses is rejected, which has non stationary is not detected and COD level time series is confirmed to be stationary.

3.1.2 Statistical Analysis of COD Level

Statistical properties are presented in Table 1. The minimum value (9 mg/l) and maximum value (127 mg/l) shows that the COD level variation has a long range. It can be seen that the time series show high randomness and scattered distribution about the mean as the standard deviation (27.35 mg/l) and variance (748.29 mg/l) are high. The training data considers the minimum and maximum input of data. This implies the trained neural network is not face any difficulties in extrapolation. The autocorrelation record show the assumption of stationarity is true.

Table 1 Statistical information of monthly COD level of River Yamuna at Nizamuddin (Delhi)
Table 2 Skewness of approximation coefficient using Daubechies wavelet

3.2 Neuro -Fuzzy Coupled Model

The neuro fuzzy model is trained with backpropogation for generalization and sugeno inference for specialization. The network is trained for 2,000 epochs with 0.001 tolerance level. The forecasting method uses ANFIS method to predict the next value. Graphical representation of this coupled model of ANN and Fuzzy is shown in Fig. 3.

Fig. 3
figure 4

Graphical representation of Neuro Fuzzy

3.3 Neuro -Fuzzy -Wavelet Coupled Model

Before performing wavelet decomposition, three issues need to be resolved that are selection of mother wavelet, order of mother wavelet and the number of level decomposition. Attributes of the mother wavelet and the characteristic of the signal should be considered carefully, while choosing the appropriate wavelet. Daubechies are the most appropriate for treating random and spike series. For these families of wavelet, the smoothness increases as the order of function increases and suitable wavelet and hence suitable higher order wavelet must be taken into account. In this work wavelet of order 2 to 8 has been considered and it was observed that the decomposed series by wavelet of order 8 are giving the best results.

Figure 4 shows signal approximation, a1 to a6 and detailed parts d1 to d6. Approximation curves correspond to low frequency bands and represent the trends of the COD level. Skewness is a measure of the symmetry of the data around the data mean and is zero for an ideal normal curve. Kurtosis refers to the degree of flatness. Kurtosis of normal distribution curve is three. (Fig. 4) shows that a1, a2 and a3 series are in similar shapes to the original signal than the other approximation level.

Fig. 4
figure 5

Six level detailed and approximation coefficient using Daubechies 8 (Db8) wavelet

Statistical property of a3 considering skewness shows the best matching and a4 to a7 does not provide any Meaningful information shown in Table 2, thus third level describes the regular behavior of COD level series. On examining the detail series d1 to d7 from (Fig. 4), it can be seen that range of detail parts is lower in comparison to approximation part. Further range of d4 to d7 is lower than d1 to d3, thus detailed part d4 to d7 are mainly superficial random noise, as is evident from the Kurtosis characteristic as shown in Table 3.

Table 3 Kurtosis of detailed coefficient using Daubechies wavelet
Table 4 Training and testing results by neuro fuzzy wavelet to predict nine months ahead COD level of river Yamuna using db2 to db8 wavelet

d1 to d3 detects the localized variation and provides good results. On visual and statistical inspection d1 and d2 contain useful higher frequency information and shows the irregularities representing random variations. d3 shows some peaks that allow time localization of peak COD level.

This model presents a different approach, here the idea is to decompose the COD level series using wavelet with daubechies8 wavelet and model it via individual fitting of each level of resolution. That is, each component (a3, d3, d2 and d1) is modeled separately, and the final result is obtained by adding those four forecasts. (Fig. 5) shows the diagram for this model where input is given to the ANFIS architecture to predict the wavelet coefficient.

Fig. 5
figure 6

Forecasting procedure

The total predicted at any instant (i) is given by

$$ f(i)=f{}_1(i)+{f}_2(i)+{f}_3(i)+{f}_4(i) $$
(11)

3.4 Experimental Results

The numerical experiments of predicting the future level of COD have been performed. The instrumental record data for training and testing have been collected from 1999 to 2009. 10 years data have been used for learning and the other part of 9 months for testing only. The results of learning and testing have been assessed on the basis of the mean absolute error (MAE) and the average relative error of the actual and the predicted value.

The mean absolute error (Chaturvedi, et al. 2004):

$$ MAE=\frac{1}{N}\left({\displaystyle \sum_{i=1}^N\left|{d}_i-{y}_i\right|}\right) $$
(12)

where d denote the actual measurement, y is the predicted value and N is the number of days under prediction.

The relative error:

$$ \in =\frac{\left\Vert d-y\right\Vert }{\left\Vert d\right\Vert } $$
(13)

The neuro fuzzy predictor model has been developed and tested. The results of ANFIS prediction is compared with neural network and neuro fuzzy models. The introductory experiment has been performed for the prediction of the level of COD without any previous decomposition. However, the results were not encouraging. The training and testing error is far too large due to lack of generalization ability. This was the main reason to proposed the additional step of decomposing the data into the wavelet coefficients and used neuro fuzzy for predicting the coefficients of each decomposition level.

Neuro fuzzy model is evaluated considering the following combination of input data for COD level forecasting

  1. 1.

    p (t) for prediction of p (t + τ).

  2. 2.

    p (t-τ) and p(t) for prediction of p (t + τ).

  3. 3.

    p (t-2τ), p (t-τ) and p (t) for prediction of p (t + τ).

The performance evaluations measures are mean absolute error (MAE) and relative error (є). The result shows that the best forecast was obtained with past three inputs.

Neuro fuzzy wavelet model is evaluated using wavelet sub series. Past three input, db8 is used to predict the COD level. The results are confirmed for order 2–8 as shown in Table 4 that past three input, db8 has shown the best results. The testing data which was not used in learning shows 4.68 % relative error for 9 months ahead prediction. This is very interesting phenomenon of the proposed system of prediction follow the principle of neuro, fuzzy and wavelet coupling. The neuro, fuzzy and wavelet coupled model as described in Artificial Neural Networks (ANN) is implemented to predicting the wavelet coefficients a3, d1, d2 and d3. Each coefficient is added up to predict the next future instant value. The results of predicted coefficients for training data are shown in the (Figs. 6, 7, 8 and 9).

Fig. 6
figure 7

a3 actual and trained reults

Fig. 7
figure 8

d1 actual and trained results

Fig. 8
figure 9

d2 actual and trained results

Fig. 9
figure 10

d3 actual and trained results

The trained predicted output is obtained from the decomposed wavelet coefficients by simple summation represented by S(n).

$$ S(n)={D}_1+{D}_2+{D}_3+{A}_3 $$
(14)

The actual trained and predicted trained signal is represented in the (Fig. 10).

Fig. 10
figure 11

Actual output and predicted time series of average COD level for training data using neuro fuzzy wavelet

The 7 % data which was left for testing is applied to the trained system. The actual testing coefficients obtained from wavelet decomposition are given to the ANFIS system individually. The total predicted and actual testing output is shown in (Fig. 11). Figure 12 shows the vadidation results of introduced model.

Fig. 11
figure 12

Actual output and predicted time series of average COD level of testing data using Neuro fuzzy wavelet

Fig. 12
figure 13

Validation of model

The monthly COD level prediction is also carried out by classical ANN. The ANN was trained Levemberg-Marquardt method. The sigmoid and linear activation functions are used at hidden layer and output neuron respectively. The hidden neuron was changed as (3,3,1), (3,4,1) and (3,6,1). The best results were obtained for (3,4,1) denoting three inputs, four hidden neuron and one output node. The neuro fuzzy wavelet, neuro fuzzy and ANN models are compared to each other in the Table 5. The Neuro fuzzy model shows slightly better results than the ANN, but neuro, fuzzy and wavelet coupled model forecast perfectly than other models.

Table 5 Training and testing results of each model for monthly COD level forecasting

The neuro fuzzy wavelet model has given 4.68 % relative error for the testing data and provided a good with the training data (0.21 %). The relative error for the neuro fuzzy, ANN method are 9.47, 12.28 % respectively. Therefore the testing data for these models are not showing good fit in comparison to the neuro fuzzy wavelet model. The mean absolute error for neuro fuzzy wavelet is 3.3475 mg/l.

4 Conclusions

In this work prediction the COD level of river Yamuna for next 9 months which gives satisfactory results on employing wavelet decomposition with neuro fuzzy. The comparison shows that the proposed method gives satisfactory results for prediction. The variability of data was unable to be trained by using only neuro fuzzy. Therefore, the wavelet decomposition and coefficient prediction plays a vital role in the analysis of pollution level in river water.