Keywords

1 Introduction

Accurate urban water demand forecasting (Gardiner and Herrington 1990) provides the basis for making operational and strategic decisions for drinking water utilities, e.g., to control the production, storage and water delivery, either in the short (Bougadis et al. 2005) and long period (Xu et al. 2015). Most of the papers dealing with forecasting urban water demand consider annual or monthly data. Few address daily water use. For instance, Maidment and co-workers (1985) developed a short-term forecasting model based on Box–Jenkins time series analysis. Chen and Boccelli (2014) proposed an integrated Time Series Forecasting Framework (TSFF) to statistically predict hourly/quarter-hourly demands. The problem is not trivial as many variables are considered of influence in determining drinking water demand, e.g., trend, seasonality, climatic correlation and autocorrelation. In contrast to time-distant observations, close ones are expected to be highly correlated.

In the following, the authors prove that water levels of urban tanks serving water distribution networks are properly forecasted by a stochastic model with moving average and autoregressive approaches, namely the AutoRegressive Integrated Moving Average (ARIMA) model (Box et al. 2015). The proposed models will be calibrated on a dataset of tank water levels recorded in the tank of Cesine (Avellino, Italy).

2 Materials and Methods

In this section, the two models chosen for analyzing the available dataset are presented. Let us underline that the coefficients of the models have been evaluated in the framework of “R” statistical software, using part of the available dataset for calibration purposes. Adopting the maximization of the likelihood, the software provides the estimation of the model parameters.

2.1 Model 1, ARIMA(2,0,2)

This model does not present a differentiation in the data. The model forecasting, given by Eq. (1), furnishes the estimation of the quantity Y at the time t. In our case, Y is the tank level, and \(\mu\) is the intercept of the model, while \(\varphi_{1}\), \(\varphi_{2}\), \(\vartheta_{1}\), \(\vartheta_{2}\) are, respectively, the autoregressive (AR1 and AR2) and moving average (MA1 and MA2) parameters:

$$\hat{Y}_{t} = \mu + \varphi_{1} \left( {Y_{t - 1} - \mu } \right) + \varphi_{2} \left( {Y_{t - 2} - \mu } \right) + \vartheta_{1} \left( {e_{t - 1} } \right) + \vartheta_{2} \left( {e_{t - 2} } \right)$$
(1)

\(e_{t - i}\) is the residual at time t-i (the difference between observed and estimated tank level). The need of knowing \(e_{t - 1}\) for giving the forecast forces this model to furnish prediction only “one step ahead”. Values of the model parameters, estimated using the calibration dataset, are reported in Table 1. An intercept of about 2.7 m can be considered the mean value of the tank level during the calibration period.

Table 1 Estimated coefficients of the ARIMA (2,0,2) model and related standard errors

2.2 Model 2, ARIMA (3,1,3)

This model presents an order one differentiation and includes one more term, both on the autoregressive and on the moving average parts. The model forecasting formula is reported below and, again, furnishes the estimation of the quantity Y at the time t. Y, \(\varphi_{i}\), \(\vartheta_{i}\) and \(e_{t - i}\) have the same meaning of above:

$$\hat{Y}_{t} = Y_{t - 1} + \varphi_{1} \left( {Y_{t - 1} - Y_{t - 2} } \right) + \varphi_{2} \left( {Y_{t - 2} - Y_{t - 3} } \right) + \varphi_{3} \left( {Y_{t - 3} - Y_{t - 4} } \right) + \vartheta_{1} \left( {e_{t - 1} } \right) + \vartheta_{2} \left( {e_{t - 2} } \right) + \vartheta_{3} \left( {e_{t - 3} } \right)$$
(2)

Again, in order to give the prediction, this model needs \(e_{t - 1}\), thus only “one step ahead” forecasts can be produced. Table 2 reports the estimation of the model parameters, estimated using the calibration dataset.

Table 2 Estimated coefficients and standard error of the ARIMA (3,1,3) model

3 Case Study and Dataset Description

The models presented above have been calibrated on a time series of water tank levels measured in the Cesine tank. This tank belongs to the network that provides drinking water to the city of Avellino, Italy. The chosen time series refers to 2014 and reports the daily level measurements, in meter. The first 333 data (from January 2 to November 30, 2014) have been used for calibrating the models and to estimate the parameters in the “R” software (see Sect. 2). The last month, December 2014, has been used to validate the models, i.e., comparing the results of models with data that have not been used in the estimation of the parameters. The results of the validation will be presented in a further work.

Since some data were missing and since the models need to work on a continuous dataset, the missing data have been imputed with the mean of the previous and successive measured data. This simple approach usually works in non-seasonal time series, such as the one considered here, since the linear interpolation takes care of the temporal location of the missing data, with respect to closest points (Moritz et al. 2015). Other possible imputation techniques can be found in literature, for example, in (Guarnaccia et al. 2015), in which a deterministic time series model and a regression method are compared.

4 Results and Discussion

The results of the models are plotted in Figs. 1 and 2, overlapped with the observed data. The statistics of the residuals (i.e., observed minus predicted tank level in each time period) are reported in Table 3.

Fig. 1
figure 1

Observed and forecasted water levels of the Cesine tank in the calibration time range. The black line is the observed series, and the red line is the forecast of the ARIMA (2,0,2) model

Fig. 2
figure 2

Observed and forecasted water levels of the Cesine tank in the calibration time range. The black line is the observed series, and the blue line is the forecast of the ARIMA (3,1,3) model

Table 3 Summary statistics of the residuals during the calibration phase

Let us underline that both models provide “one step ahead” prediction, adopting the data of the previous day to produce the forecast (see formulas 1 and 2).

It can be noticed from Fig. 1 that model 1 (red line), i.e., ARIMA (2,0,2), follows quite precisely the observed data (black line). The sudden and steep fluctuations of the tank level are not exactly captured by the model. This is due to the fact that the model needs at least one period of time (in this case one day) to get the sudden variations of the series.

Looking at Fig. 2, the plot of model 2 (blue line), i.e., ARIMA (3,1,3), can be compared with the observed data (black line). It can be deduced that the implementation of a more complex model, with first order differentiation and with two terms more, does not provide strong benefits in the prediction. There is a quite general overestimation by the model, and this is confirmed by the mean of the residuals, as can be seen in Table 3. Model 1 performs better than model 2 and should be preferred in this case study of “one step ahead” prediction of daily water tank levels.

5 Final Remarks

In this paper, the modeling of water tank daily level behavior has been faced by means of ARIMA approach. The case study of Cesine tank, that provides drinking water to Avellino, Italy, has been used for calibration of the parameters and comparison between observed and forecasted data. The proposed models provide predictions for the following day, giving quite good results. Model 1, i.e., ARIMA (2,0,2), gives, on average, better results of model 2, i.e., ARIMA (3,1,3), even though it has a lower number of parameters. The mean of the residuals, that are the differences between observed and predicted tank levels, is very close to zero, for both models, suggesting that this approach can be successfully used in these kinds of problems.