Introduction

Monitoring and control of water quality is extremely important because of its implications for the environment. In-situ measurements of water quality parameters have received a great importance worldwide and various programs have been introduced emphasizing the measure and identification of the important water quality parameters. Hence, direct in situ measurements are a useful tool for monitoring water quality in rivers, lakes and streams ecosystems. Nowadays, new methods are to be very welcome, and help to estimate water quality parameters in the absence of any measurement. One of the most important water qualities is certainly dissolved oxygen concentration (DO). DO is used as water quality indices, has been integrated as a part integral of some water quality models, and it is an important indicator of water pollution. Regarding the importance of DO, an extensive effort is being made to develop robust models that can help to estimates DO. Methods such as artificial neural network, fuzzy logic and neurofuzzy, evolutionary models, wavelet decomposition models, and Extreme learning machines, have been employed and specifically developed for this subject (Heddam 2014a, b, c; 2016a, b, c). In the all developed models, DO is linked to water quality variables used as inputs. Furthermore, it would be very interesting to investigate if the use of new modelling strategy leads to more accurate models or not. In the present investigation we present a new method for predicting DO without water quality variables. We used only the components of the Gregorian calendar which are: (1) the year in the usual Gregorian calendar (YY), (2) the month of the year (MM), (3) the day of the month (DD), and (4) the number of complete hours that have passed since midnight (HH).

Materials and methods

Study area and data set

In the present study we selected DO data from two stations: (USGS ID 14210000) at Clackamas River at Estacada, Oregon, USA, (Latitude 45°18′00″, Longitude 122°21′10″ NAD27), and (USGS ID 14211010) at Clackamas River near Oregon City, USA, (Latitude 45°22′46″, Longitude 122°34′34″ NAD27). Figure 1 shows the locations of the two stations in study area. For the two stations the data set is divided into three sub-data sets: (1) a training set (60 %), (2) a validation set (20 %) and (3) a test set (20 %). The historical DO data are available at the United States Geological Survey (USGS) website, http://or.water.usgs.gov/cgi-bin/grapher/table_setup.pl?site_id. DO were collected at every thirty (30) min (1/2 h) intervals of time. The dataset had a total of 5848 records for the USGS ID 14210000 station, and 5882 records for the USGS ID 14211010 station. These data cover all of the four seasons, and 1 month for each season was selected: January for the winter, April for the spring, July for the summer, and October for the autumn. At the present DO corresponds to the output and the four following component of the Gregorian calendar are identified as the inputs of the developed models: (1) the year in the usual Gregorian calendar (YY) and equal to 2015, (2) the month of the year between 01 (January) and 12 (December) (MM), (3) the day of the month between 01 and 31(DD), and (4) the number of complete hours that have passed since midnight, between 00 and 23 (HH). A summary of descriptive statistics of the data set for the selected stations is shown in Table 1, where Xmean, Xmax, Xmin, Sx and Cv denote the mean, the maximum, the minimum, the standard deviation, and the coefficient of variation, respectively.

Fig. 1
figure 1

(Adopted from [Lee 2011])

Map showing the location of the two stations in Clackamas River, Oregon, USA, with USGS station identification number.

Table 1 Statistical parameters of data set for the two stations

Radial basis function neural network (RBFNN)

Radial basis function neural network has a feed forward architecture with only three layers: an input layer, a hidden layer, and an output layer as shown in Fig. 2. In the hidden layer, each neuron implements a radial basis function. The RBFNN uses a linear transfer function for the output neuron. To the mathematical point of view, the RBFNN structure shown in Fig. 2 can be presented as follow:

Fig. 2
figure 2

Architecture of radial basis function neural network (RBFNN).YY year, MM month of year (1–12), DD day of month (1–31), HH hours of day (00–23)

The RBFNN Gaussian function can be written as:

$$ \mathop \varphi \nolimits_{i} \left( x \right) = \exp \left( { - \frac{{\mathop {\left\| {x - \mathop \mu \nolimits_{i} } \right\|}\nolimits^{2} }}{{2\mathop \sigma \nolimits_{i}^{2} }}} \right)\quad {\text{i}} = 1,{ 2},N $$
(1)

where σ i is the widths (or spread) of the hidden neuron.

The output of the RBFNN model can be calculated as follow

$$ \mathop \varUpsilon \nolimits_{i} = \sum\limits_{j = 1}^{N} {\mathop w\nolimits_{ij} } \, \mathop \varphi \nolimits_{\text{j}} \left( x \right) \, + \, \mathop B\nolimits_{2} $$
(2)

w ij represents a weighted connections between the radial basis function neuron and output neuron; and N = number of hidden-layer neurons. The constant term B 2 represents a bias. Over the last few years, researchers successfully used RBFNN for many areas of science research (Pal et al. 2016; Bhunia et al. 2016; Parsaie 2016; Ehteshami et al. 2016; Barzegar and Moghaddam 2016; Heddam 2016a).

Multiple linear regression (MLR)

Multiple linear regression (MLR) models are used to examine the relationship between an ensemble of inputs variables and an output variable, using the flowing equation:

$$ Y = \varphi { (}\mathop x\nolimits_{i} )= \mathop \psi \nolimits_{0} + \mathop \psi \nolimits_{1} \mathop x\nolimits_{1} + \mathop \psi \nolimits_{2} \mathop x\nolimits_{2} \mathop { + \psi }\nolimits_{3} \mathop {\mathop x\nolimits_{3} + \psi }\nolimits_{4} \mathop x\nolimits_{4} + \mathop \psi \nolimits_{i} \mathop x\nolimits_{i} $$
(3)

where Y is the dependent variables (DO), xi are the independent variables (YY, MM, DD, and HH), and Ψi are the parameters of the models.

Model performance indices

For the evaluation of the model we use three indices: the coefficient of correlation (CC), the root mean squared error (RMSE) and the mean absolute error (MAE).

$$ R = \left[ {\frac{{\frac{1}{\rm N}\sum {\left( {\mathop {\rm O}\nolimits_{i} - \mathop {\rm O}\nolimits_{m} } \right)\left( {\mathop {\rm P}\nolimits_{i} - \mathop {\rm P}\nolimits_{m} } \right)} }}{{\sqrt {\frac{1}{\rm N}\sum\nolimits_{i = 1}^{n} {\left( {\mathop {\rm O}\nolimits_{i} - \mathop {\rm O}\nolimits_{m} } \right)^{2} } } \sqrt {\frac{1}{\rm N}\sum\nolimits_{i = 1}^{n} {\mathop {\left( {\mathop {\rm P}\nolimits_{i} - \mathop {\rm P}\nolimits_{m} } \right)}\nolimits^{2} } } }}} \right]^{{}} $$
(4)
$$ RMSE = \sqrt {\frac{1}{\rm N}\sum\limits_{i = 1}^{\rm N} {\mathop {\left( {\mathop {\rm O}\nolimits_{i} - \mathop {\rm P}\nolimits_{i} } \right)}\nolimits^{2} } } $$
(5)
$$ {\rm M}{\rm A}{\rm E} \, = \frac{1}{\rm N}\sum\limits_{i = 1}^{\rm N} {\left| {\mathop {\rm O}\nolimits_{i} - \mathop {\rm P}\nolimits_{i} } \right|} $$
(6)

where N is the number of data points, O i is the measured value and P i is the corresponding model prediction. O m and P m are the average values of O i and P i .

Results and discussion

In this paper, two approaches were compared for modelling DO concentration without any water quality variables. The RBFNN and MLR models were implemented by using two different program codes written in MATLAB language. A comparison of the performance of the RBFNN model with that of the MLR model was carried out to study their efficacy in modelling DO concentration. The performances of two developed models, for the two stations are reported in Table 2. According to Table 2, the results using the RBFNN models for the two stations are very good with high level of accuracy in the all three phases. For the USGS 14211010 station, the calculated DO were correlated with the measured values with an R equal to 0.975, 0.973, and 0.974, in the training, validation and test phase respectively. The corresponding RMSE and MAE were (1.186 and 0.930 mg/L), (1.137 and 0.888 mg/L), and (1.184 and 0.934 mg/L), in the training, validation and testing phases, respectively. The results also show that the RBFNN is more accurate during all the three phase when compared to the MLR model. Model comparisons were made to see which model gave better results. The worst results were obtained in MLR models, with an R coefficient equal to 0.656 in the test phase. Figures 3 and 4 show scatter plots of the calculated against the corresponding measured DO for the RBFNN and MLR models, respectively, in the (a) training, (b) validation, (a) test, and (d) all data, for the USGS 14211010 station. According to Table 2, for the USGS 14210000 station, the results using RBFNN are always good with an R equal to 0.973 in the test phase, an RMSE equal to 0.377, and an MAE equal to 0.306. The worst results were obtained in MLR models, with an R coefficient equal to 0.762 in the test phase. Figures 5 and 6 show scatter plots of the calculated against the corresponding measured DO for the RBFNN and MLR models, respectively, in the (a) training, (b) validation, (a) test, and (d) all data, for the USGS 14210000 station.

Table 2 Performances of the RBFNN and MLR models in different phases
Fig. 3
figure 3

Scatterplots of calculated versus measured values of dissolved oxygen (DO) for the USGS 14211010 station using RBFNN model: a training, b validation, c testing, and d all data

Fig. 4
figure 4

Scatterplots of calculated versus measured values of dissolved oxygen (DO) for the USGS 14211010 station using MLR model: a training, b validation, c testing, and d all data

Fig. 5
figure 5

Scatterplots of calculated versus measured values of dissolved oxygen (DO) for the USGS 14210000 station using RBFNN model: a training, b validation, c testing, and d all data

Fig. 6
figure 6

Scatterplots of calculated versus measured values of dissolved oxygen (DO) for the USGS 14210000 station using MLR model: a training, b validation, c testing, and d all data

Conclusions

To deal to the lack of the available water quality variables generally used as input to the models developed for predicting DO concentration in river ecosystems, this study has attempted to proposes a new kind of model that can be used to predict hourly dissolved oxygen without the need to any water quality variables. Using only the four components of the Gregorian calendar, we have obtained very promising results. In the future, we need to elevate the promising results being achieved.