Keywords

1 Introduction

Forecasting water consumption at the household level is a problem that can be addressed from different perspectives. First, on the basis of the forecasted water consumption, a corresponding system of forecasted bills/charges can be applied. In this case, the resident pays her bills calculated on the basis of forecasted water consumption. This results in more accurate forecasting, less frequent reading of water meters, and the charges paid by the consumer are fair. For this purpose, the monthly time scale is usually applied.

From the perspective of water conservation, the forecasted values of water consumption can be compared with actual values. In the cases when the actual value is lower then the forecasted value, the water conservation process can be recognized; otherwise, the consumer may be warned of excessive water usage. For the purposes of water conservation, the considered time scale in which forecasts are made should be detailed. The households should be equipped with smart water meters that are enabled to perform readings at daily, hourly, or even more detailed temporal scales. Furthermore, to assess water conservation with respect to the type of consumption (e.g., showering, dish-washing, drinking or other activities), water metering should be made separately for each of the corresponding areas of consumption. The role of water efficient technology such as smart meters in reducing water consumption throughout the day is crucial [4].

Forecasting of water consumption is subject to two variables as described in the literature: (a) the analysis of domestic water consumption [5, 15] and (b) the forecasting of time series [7, 8].

A detailed analysis of water consumption at households was made in [16] and more recently in [17]. Water consumption in the kitchen was investigated [20], where water withdrawal was related to different take events, e.g. drinking, cooking, etc. In [4] the patterns of water consumption in the household were analyzed on an hourly scale. A household water saving model based on Bayesian network was proposed [13]. In [9] a general architecture of the decision support system based on forecasting water consumption was described. More recently, the potential of water conservation during dish washing has been investigated in a dissertation [10].

Knowledge about physical properties of households, e.g., number of rooms, type of house/flat, garden, was used for the analysis of residential water consumption [5]. Also in [5] the problem of missing data in water meter readings was investigated. A practical decision support system for water conservation at household has been developed by IBM. On the basis of the available report [12] it can be noted that one of the applied solutions was based on the recognition of trend of water consumption. An overview of the literature on the analysis of domestic water consumption was made [5, 15].

The problem of forecasting domestic water consumption was investigated in several publications. In [11] the daily and hourly patterns of water consumption in households were analyzed. A general review of selected forecasting models used at the urban and household levels was made in [11]. Recently, it has been shown that the bottom-up approach to the forecasting of urban water demand is superior to the top-down approach based on bulk water meters [2].

In particular, Bayesian networks have been used for the forecasting of time series [1, 22]. Recently, an overview of Bayesian forecasting approaches has been reported [3]. Our previous studies [6, 18] applied Bayesian models for the forecasting of water demand at the city level. Due to the different characteristics of the data and the domain of application, those publications are not directly related to the approach presented in this paper.

Most of the existing forecasting models reported in the literature rely on information gathered from households using questionnaires, including information about household occupancy, household income, or even from daily diaries prepared by residents. Significant additional effort is required to gather such data and these data are usually not anonymous. By contrast, our approach utilizes only the near real-time readings of water meters.

For the purpose of forecasting domestic water consumption, we design a structure of the Bayesian network. In addition, we investigate a related problem, the dependence between the equal-length discretization technique and its influence on the accuracy of forecasting.

The paper is organized as follows. Section 2 provides basic notions on Bayesian networks. In Sect. 4, the forecasting accuracy measures selected for this research are reviewed. Section 3 describes our contribution: the adaptation of Bayesian network to the forecasting of domestic water consumption. The results of experiments are presented in Sect. 5. Section 6 concludes the paper.

2 Bayesian Networks—Basic Notions

Bayesian network (BN) is a knowledge representation tool that is proposed in this paper for the application of forecasting of domestic water consumption. The BN is able to relate the symbolic values of explanatory variables to the discretized numerical values of the forecasted time series. BN is a transparent graphical model that explicitly depicts the probabilistic dependencies discovered in data.

A Bayesian network is a triple \(BN = (X, DAG, P)\), where X is a set of random variables, \(DAG = (V, E)\) is a directed acyclic graph, and P is a set of conditional probability distributions [14]. Each node of the graph \(v_i \in V\) is related to the discrete random variable \(X_i \in X\). The edges from the set \(E \subseteq V \times V\) of DAG correspond to conditional dependence between random variables. \(P(X_i |X_{pa(i)})\) is the conditional probability distribution for each \(X_i \in X\), where: pa(i) is the set of conditioning variables of the \(X_i\). The posterior probability of unknown \(X_i\) given the set of evidence variables \(XE \subset X\) is calculated using the BN model.

The learning of the Bayesian network consists of two steps: (1) structure learning and (2) learning of parameters, i.e., the probability distributions P.

To learn the structure of BN we selected already implemented state-of-the-art algorithms: Hill-Climbing—a greedy search on the space of the directed graphs; Tabu Search—a Hill-Climbing with the additional mechanism of escaping from local optima; Max-Min Hill-Climbing—a hybrid algorithm combining the Max-Min algorithm and the Hill-Climbing algorithm; and Restricted Maximization—a generalized implementation of Max-Min Hill–Climbing.

To learn the parameters, Maximum Likelihood Estimation (MLE) has been selected. The details of the applied algorithms are available in the documentation of ’bnlearn’ library [21] of the R package [19].

3 Bayesian Approach to the Forecasting of Domestic Water Consumption

Let \(\{W(t)\}\) denote the considered water consumption time series, where \(W(t) \in \mathfrak {R}\) is a real-valued variable and \(t\in [1,2,\) \(\ldots , n]\) is a discrete time scale of the length \(n \in \aleph \). We define a set of the random variables related to the considered time series. Let \(Hour(t) \in [1,24]\) be a variable indicating the hour of day. Similarly, \(Day(t) \in [1,7]\) is a variable indicating the day of week.

The analysis of water consumption data revealed that the variations in water consumption are related to the part of the day. It is possible to distinguish 5 intervals in which the water consumption can be analyzed: morning, day, afternoon, evening and night. The determination of these particular periods with respect to people’s behaviour is a complex problem. For this paper, we decided to make an arbitrary mapping of those intervals to the hours of the day, according to observations of the behavior of the inhabitants in the single household that was considered in our research. The assumed mapping is given in Table 1. Further investigation of the problem of that mapping is left for future research.

Table 1 Temporal partitioning of day

The partitioning of the day led to the introduction of the related variable POD(t) \(\subset \) \(\{morning, day, afternoon, evening, night \}\) assuming symbolic values related to the parts of the day.

To enable the application of the standard Bayesian network for the forecasting of water consumption we discretize the domain of W(t). For the purpose of this paper we decided to use the simplest, equal-length discretization method. The discretized version of W(t) is denoted as \(W_d(t) \in \aleph \), where the values of \(W_d(t)\) are the positive integers specifying the intervals to which the actual values of W(t) belong. The parameter \(k \in \aleph \) determines the number of the discretization intervals applied.

Table 2 The set of variables

The lagged value \(W_d(t-1)\) was assumed as the last of the explanatory variables. Due to a very short forecasting interval of 30 s, the lack of short-term seasonality confirmed by the plot of correlation function (ACF), there was no reason for using higher order lags. The final set of the variables is given in Table 2.

4 Forecasting Accuracy Measures

To evaluate forecasting accuracy of the discretized version of time series, i.e., \(W_d(t)\) we use two measures already applied in our previous study [18]. To asses the ratio of perfect forecasts PFR (related to all forecasts made) the formula (1) is used:

$$\begin{aligned} PFR = \frac{\sum _{t=1}^{n} 1|W'_d(t) = W_d(t) }{n}, \end{aligned}$$
(1)

where \(W'_d(t)\), \(W_d(t)\) denote the forecasted and actual values of water consumption respectively; \(n \in \aleph \) is the final time step of the forecasted time series.

To calculate the accumulated forecasting error the discrete mean absolute error (DMAE) accuracy measure is selected (2):

$$\begin{aligned} DMAE = \frac{\sum _{t=1}^{n} |W'_d(t) - W_d(t) + 1| \cdot d}{n}, \end{aligned}$$
(2)

where d denotes the length of the applied discretization interval.

To evaluate the forecasting accuracy, the time series was partitioned into learning and testing periods. We applied the concept of a growing window. Following this assumption, the learning period began at the first time step and finished at time \(t-1\) of the historical time series. This way, the length of the growing window increases as time continues. The BN model was retrained at every time step and the prediction was made 1-step ahead, i.e., for the step t. The minimum length of the growing window was set to 20 steps (days).

5 Experiments

For the purpose of this study we have collected data readings from smart water meters installed in the kitchen of the anonymous household. All experiments described in this section were performed using the ’bnlearn’ library [21] of the R package [19].

5.1 Source Data

Source data were collected from the initial period of the project lifetime, from 28 November 2014 to 21 February 2015. Due to the short period of gathering data, the collected time series did not cover monthly variations of water consumption. The data were gathered from two smart meters measuring the water intake of the dishwasher and that used by the sink. The readings were made in time intervals of 30 s.

Figure 1a depicts the mean water usage of the dishwasher for every day of the week. As can be noted, the dishwasher was used only on Fridays, Saturdays and Sundays with a very low variation of water usage. For that reason we focused our attention only on forecasting water consumption at the sink. As Fig. 1b shows, the most water consumption occurred just after and before weekends, on Mondays and Fridays, respectively. As Fig. 1c shows, the highest mean water consumption can be recognized in the afternoon with very low consumption at night. When considering short-term water usage, a peak can be observed at about 5 AM, see Fig. 1d.

Fig. 1
figure 1

Mean water consumption in kitchen. a Dishwasher—mean daily consumption, b Sink—mean daily consumption, c Sink—mean daily pattern, d Sink—mean hourly consumption

Table 3 Distribution of water consumption during the week

Table 3 presents the distribution of water consumption with respect to the days of the week and the parts of the day. Peaks for water usage were recorded on Monday during the day and on Friday afternoon. The content of Table 3 can be interpreted as an individual, weekly consumption profile of the individual water consumer. Further analysis of these type of profiles for other points of water intake placed in kitchen is planned. After gathering data from the other households, the analysis of data will be performed with respect to other users.

5.2 Experimental Results

First, the structure of the BN was learned automatically using all of the algorithms given in Sect. 2. However, the obtained values of DMAE were high. An attempt was undertaken to design the structure of the BN. Following our previous experiences with the forecasting of time series using Bayesian networks [6, 18], we assumed \(W_d(t)\) as a single dependent variable. The rest of the variables that played the role of evidence variables directly influenced \(W_d(t)\). This way, only forward inference within the BN was assumed. Table 4 describes the investigated BNs with the related evidence variables.

Table 4 Forecasting accuracy for the designed structures of BNs

We started the investigations with a simple BN1 containing only a single evidence variable and the conditional distribution \(W_d(t)|W_d(t-1)\) assigned to the child node. In the second step we complemented the network by adding other variables. This way BN2, BN3, and BN4 were constructed. Because the addition of POD(t) was shown to be the most beneficial in terms of DMAE, we complemented BN3 further by adding first the variable Day(t) and then, Hour(t). In this way we obtained BN5 and BN6. The obtained results of DMAE with the PFR given in brackets are shown in Table 5. The parameter k denotes the number of equal-length intervals used for the discretization.

Table 5 Forecasting accuracy DMAE(PF) for the designed structures of BNs

Table 5 shows that the BN3 is the best model and that further addition of evidence variables does not help.

Fig. 2
figure 2

Discrete mean absolute error

As expected, the ratio of perfect forecasts decreases while increasing the number of intervals and thus making those intervals shorter. On the other hand, the value of DMAE increases reaching it’s lowest values for \(k > 30\). To further investigate this dependency, we repeated the experiment with BN3 for all values of \(k \in [5,50]\). The results are shown in Fig. 2. This way it has been confirmed that the lowest value of DMAE are obtained for \(k>30\). Then the DMAE exhibits random variation. On the other hand, to obtain the PFR as high as possible, for the considered data, we selected \(k=30\) for future research.

6 Conclusions

In the presented study, we investigated the application of standard Bayesian networks to the forecasting of water consumption. We designed a Bayesian network model that is able to efficiently forecast a discretized time series of domestic water consumption. We have discovered that the application of the evidence variable related to the parts of day augments good forecasting accuracy. Further work is required to compare the obtained results with the other forecasting methods. After collecting more data covering seasonal and monthly variations of water consumption, theoretical and experimental research will be extended. The planned research will also include the analysis of user specific water consumption profiles.