1 Introduction

Recently, several software programs have been developed with the objective of modeling, in detail, the hydraulic behavior of drinking water distribution systems. One of the hydraulic variables used in these models is residential water demand, which has been idealized as a variable that varies hourly using a smooth Hourly Demand Variation Curve (HDVC). The HDVC is used in practically all known public domain and commercially available drinking water distribution network modeling software programs, such as EPANET, InfoWorks®, ScadRED® and others. However, at a residential service level, this curve does not accurately reflect reality. Residential demand is sporadic, characterized by sudden demand pulses, and tends to have a stochastic character (Buchberger et al. 2003a; Alvisi et al. 2003; Alcocer-Yamanaka 2007), especially when considering time scales on the order of seconds. Therefore, models with a stochastic focus were recently developed to represent residential water demand. These models include the Poisson Rectangular Pulse (PRP) model (Buchberger and Wu 1995; Buchberger et al. 2003a) and the Neyman-Scott Rectangular Pulse (NSRP) model (Alvisi et al. 2003; Alcocer-Yamanaka et al. 2008a, b). In order to be applied, the PRP model requires registering directly the instantaneous water demand (with a one-second time interval), while the NSRP model considers temporal disaggregation of demand so that different registering time intervals can be used. The demand series generated by both models have statistical parameters, such as the mean, variance, covariance and probability distribution, which are similar or identical to those of the original demand series (observed). The estimation of parameters and the generation of synthetic demand series make it possible to minimize the amount of information necessary to model residential water demand.

These models were primarily applied in the hydrology field to generate synthetic series, representing rainfall or storm events according to the projected interval and duration (Rodriguez-Iturbe et al. 1984; Rodriguez-Iturbe et al. 1987; Entekhabi et al. 1989; Entekhabi and Bras 1990). To the best of our knowledge, only two studies have used the NSRP model in an attempt to represent residential water demand (Alvisi et al. 2003 and Alcocer-Yamanaka 2007). Buchberger et al. (2003b) and Buchberger et al. (2003c) were the first to apply PRP generated stochastic demand to water distribution network modeling, but no field verification was attempted in their study. In this paper, NSRP generated stochastic demand is applied to modeling a real water distribution network and the results obtained are compared with field observations and conventional deterministic models.

2 Application Site and Deterministic Focus

In this study, deterministic and stochastic models were applied to the drinking water network in the Humaya sector of the city of Culiacan, Sinaloa due to the large amount of field data available (Alcocer-Yamanaka and Tzatchkov 2002, 2003, 2004; Alcocer-Yamanaka et al. 2004, 2008a, b; Tzatchkov et al. 2004, 2005). The data available includes pressure and flow rate measurements at the supply sources and at internal points in the water distribution network, as well as water heights in the regulation tank. Furthermore, water quality measurements, including residual chlorine, total organic carbon, pH and temperature measurements have been taken at the supply sources and at points inside the distribution network, and residential demand measurements for 69 homes, with a 1-min time step and mean period of 7 days, are available.

The area has two supply sources. The first consists of a single well that yields an average flow rate of 51 L/s, and the second is a group of eight wells that have a maximum capacity of 200 L/s. There are two regulation tanks, one with a capacity of 3,000 m3 and an altitude of 82.63 m above sea level. The other tank has a capacity of 2,000 m3, and 80.00 m of altitude above sea level.

The population of the study area was estimated to be 85,483 in 2005, at the time the field measurements were carried out. This figure was based on the number of service connections (20,353) in each suburb and subdivision and the crowding index (4.20 inhabitants/service connection), according to information from the Culiacan Municipal Drinking Water and Sewer Authority (Junta Municipal de Agua Potable y Alcantarillado de Culiacán (JAPAC)). According to reports from the local water utility, physical leaks primarily occur at household connections and account for a water loss of approximately 30%.

The geometric data for the drinking water supply network, including the pipes of all diameters (2 to 18 in) as well as the population, the demand and other data necessary for the hydraulic modeling, were introduced into the EPANET software program.

The stochastic model that we used covered 1 week (168 h). The results obtained using the deterministic and stochastic models were compared to field measurements to determine the advantages and disadvantages of both models. Figure 1 shows the location of the nodes and links analyzed within the Humaya sector in Culiacan, Sinaloa. However, due to space and time constraints, this work only discusses a few nodes and links.

Fig. 1
figure 1

Location of the network nodes and links analyzed

The hourly variation curve presented here is an idealized model of water demand. It was generated using measurements of water demand in residential and commercial areas in hydraulically isolated sectors of some distribution networks called hydrometric districts or District Metering Areas (DMAs) (Fig. 2).

Fig. 2
figure 2

Hourly demand variation curve for all of Mexico (Tzatchkov 2007)

It is important to note that the methodology used to construct this curve included water demand by users and leaks within the analyzed networks. Additionally, the curve is smooth; however, when comparing this curve to continuous measurements of household demand, we found that the smooth form of this curve did not represent the reality.

3 Stochastic Focus

Recognizing that demand is random, has led some researchers (Kiya and Murakawa 1989; Buchberger and Wu 1995) to propose that demand over time follows a Poisson process. Buchberger et al. (2003a) verified this hypothesis. The process is not homogeneous because demand varies considerably throughout the day. Each water demand event is represented as a rectangular pulse whose height represents its intensity and whose width represents its duration.

Demand simulation models were recently developed, allowing the generation of water demand series using certain stochastic criteria, where one example of such a model is the PRP model (Buchberger et al. 2003a). These recently developed stochastic models are based on the following basic parameters: the arrival rate λ (representing the mean frequency of occurrence of the individual pulses), the mean intensity of the pulses μ x , the variance of the intensity Var(μ x ), the mean duration of the pulses η and the variance of this duration Var(η). These parameters were generally obtained by using demand measurements with a one-second time step. Although obtaining measurements with one-second time step has the advantage of directly monitoring the real-time evolution of residential demand, it requires the use of sophisticated measurement and data storage equipment. There is also a high computational demand when analyzing the generated data (Buchberger et al. 2003a).

In recent years, techniques have been developed that are geared toward the indirect estimation of the parameters λ, μ x , Var(μ x ), η and Var(η) for demand data spanning longer intervals, especially when space and time disaggregation is required (Alcocer-Yamanaka et al. 2008a,b; 2009a,b; Guercio et al. 2001; Rodríguez-Iturbe et al. 1984; 1987). The estimation of the parameters is based on the establishment of an objective function that expresses the relationship between the statistical moments of the observed data series and the theoretical moments of the model. This objective function is minimized through non-linear programming techniques (Bazaraa et al. 1993), yielding the desired parameters. Nadimpalli and Buchberger (2003) compared these techniques as they apply to the problem of estimation of the parameters based on examples. All of the models assume a known variation of the demand in a pipeline that supplies a set number of houses. The techniques differ from each other with respect to the type of probability distribution assumed to describe the behavior of certain parameters, including the duration, mean intensity and frequency of the pulses. The models also differ with respect to the stochastic process used to formulate the theoretical moments involved (Rodríguez-Iturbe et al. 1984).

The NSRP model was first proposed in the field of hydrology where it represents rainfall as a two level stochastic process. At the first level, the occurrence of rainfall events is modeled as a Poisson process with arrival rate λ. A certain number C of internal pulses characterizes each event in turn, where C is random number with mean μ C . The mathematical theory of the method is described in many references (Rodriguez-Iturbe et al. 1984, 1987, Entekhabi et al. 1989). The NSRP model has been used to model water demand by Alvisi et al. (2003) and Alcocer-Yamanaka et al. (2008a,b and 2009a,b). The analogy between rainfall events and residential water demand shown in Table 1 is used to transfer the method to modeling water demand. Figure 3 shows a flowchart of the methodology proposed is this paper, as explained below.

Table 1 Modeling analogy between rainfall events and residential water demand
Fig. 3
figure 3

Flowchart of the proposed methodology

The second-order moments of the aggregated process \( Y_i^{(h)} \) are the following (Entekhabi et al. 1989):

$$ E\left[ {Y_i^{(h)}} \right] = \frac{\lambda }{\eta }{\mu_C}{\mu_X}h $$
(1)
$$ \begin{gathered} Var\left[ {Y_i^{(h)}} \right] = \left[ {\frac{\lambda }{{{\eta^3}}}\left( {\eta h - 1 + {e^{ - \eta h}}} \right)} \right]\left[ {2{\mu_C}E\left[ {{X^2}} \right] + E\left[ {{C^2} - C} \right]\mu_X^2\frac{{{\beta^2}}}{{{\beta^2} - {\eta^2}}}} \right] - \hfill \\ - \lambda \left( {\beta h - 1 + {e^{ - \beta h}}} \right)E\left[ {{C^2} - C} \right]\mu_X^2\left[ {\frac{1}{{\beta \left( {{\beta^2} - {\eta^2}} \right)}}} \right] \hfill \\ \end{gathered} $$
(2)
$$ \begin{gathered} Cov\left[ {Y_i^{(h)},Y_{i + k}^{(h)}} \right] = \left[ {\frac{\lambda }{{{\eta^3}}}{{\left( {1 - {e^{ - \eta h}}} \right)}^2}{e^{ - \eta \left( {k - 1} \right)h}}} \right]\left[ {{\mu_C}E\left[ {{X^2}} \right] + \frac{1}{2}E\left[ {{C^2} - C} \right]\mu_X^2\frac{{{\beta^2}}}{{{\beta^2} - {\eta^2}}}} \right] - \hfill \\ - \lambda {\left( {1 - {e^{ - \beta h}}} \right)^2}\left[ {\frac{1}{2}E\left[ {{C^2} - C} \right]\mu_X^2\frac{1}{{\beta \left( {{\beta^2} - {\eta^2}} \right)}}} \right]{e^{ - \beta \left( {k - 1} \right)h}}\quad \quad \quad \quad k \geqslant 1 \hfill \\ \end{gathered} $$
(3)

where λ −1 represents the mean time between two events, β −1 is the mean time between each individual pulse and the start of the event, η −1 is the mean duration of the pulses, μ x is the mean intensity of the pulses, and h is the analyzed aggregation/disaggregation interval.

Once the expressions of the NSRP model have been defined, the objective function is formulated as:

$$ Z = \left[ {{{\left( {\frac{{{F_1}\left( \xi \right)}}{{F{'_1}}} - 1} \right)}^2} + {{\left( {\frac{{{F_2}\left( \xi \right)}}{{F{'_2}}} - 1} \right)}^2} + \ldots + {{\left( {\frac{{{F_n}\left( \xi \right)}}{{F{'_n}}} - 1} \right)}^2}} \right] $$
(4)

where F′ 1 , F′ 2 ,… F′ n are the values of the observed moments, which include the mean, the variance and the covariance (lag-1 correlation). F 1 , F 2 , F 3 ,…F n are the theoretical moments, which are functions of the parameter vector ξ = (λ, μ X , μ C , η, β). A value of n = 3 is assumed in this application, when using the model to estimate residential demand. The three moments represent the mean, variance and covariance in Eq. 4.

The analysis interval must be established in the formulation of the NSRP model, in order to implement the optimization scheme (in this study, the time interval was 1 min). Next, the minimization of the objective function is carried out through non-linear mathematical programming techniques (a gradient method coupled with central derivatives and quadratic approximation is used (Bazaraa et al. 1993)). This minimization yields values for each of the parameters of the model.

It is necessary to randomly assign the generated stochastic patterns to the demand at each node of the model, to apply the NSRP model to a drinking water distribution network (each node has a different number of houses) as a function of the socioeconomic level. This results in the introduction of demand patterns consisting of 10,080 data points, corresponding to each minute elapsed during 1 week. The assignment of the stochastic patterns must also consider the socioeconomic level of the households. Households were divided into three groups: lower socioeconomic level (18% of the households), middle socioeconomic level (72% of the households) and upper socioeconomic level (10% of the households), according to the criteria established by the Mexican National Water Commission (Tzatchkov 2007), and a separate set of stochastic patterns were generated for each group.

In order to account for the variations on the demand of drinking water throughout the day, the generated demand series, along with their statistical parameters, were divided into three schedule blocks, as described below. Initially, we determined the necessary parameters for the generation of synthetic series for the 69 households in which the temporal variation of demand was recorded. Subsequently, applying a Monte Carlo-like technique a number of equiprobable synthetic series were generated for assembling and validation. Assembling means the process of generating a number of series and calculating the average values of the statistical moments for the data.

By comparing the observed moments and the moments of the assembled series for each hourly block, it can be determined whether the corresponding synthetic series should be accepted and used in the stochastic simulation model. Thus, this process establishes that in the cases where the difference between the values of the moments (observed and assembled) is large, the synthetic series will not be considered as valid for the analyzed pattern, will be discarded, and new series will be generated (as future work the mathematical issue of why the optimization procedure fails with some series should be studied). However, when the difference between moments is close to zero, both the process and the generated synthetic series are considered valid. The application of this procedure showed that the number of series depends on their variance and that in most cases it is sufficient to work with 50 series. Since the 50 series are equiprobable, anyone of them can be used in the stochastic simulation model, for the corresponding hourly block.

Following this analysis, 69 curves of simulated demand patterns were generated. Hourly blocks of the generated series were randomly assigned to form a weeklong simulated pattern from Monday to Sunday, and the 69 simulated patterns were then randomly distributed to each household as a function of its socioeconomic level.

3.1 Treatment of the Recorded Data for Residential Demand

Average demand values were obtained by dividing the records into hourly blocks for the various days of the week (Table 2). The division of the data into blocks is based on the numerical disparity observed in the values of the moments at different times of day. This division of the records facilitated the calculation of an optimal solution during the optimization process, thereby facilitating the generation of more accurate synthetic series with respect to reality. In principle, hourly blocks should be selected as a function of the typical behavior of the demand of drinking water, but in this case, they were selected according to the current electric tariff schedule since the results were also used for some energy saving analysis (not presented here). The records were also divided into business day, weekend and holiday groups.

Table 2 Hourly blocks for the analysis of recorded demand

Once the field data was divided into blocks as shown in Table 2, eight hourly blocks were obtained, four for Monday through Friday, two for Saturday and two for Sunday.

Once the series for each hourly block were obtained, the mean, variance, covariance and accumulated volume moments were obtained. Then, in order to obtain statistical parameters that are valid for the NSRP model, it was necessary to define the solution space of the optimization model. This was done by using search ranges based on the reduction of intervals in the order of magnitude of the decision variables, based on field measurements. Finally, the synthetic series that were used in the public domain software program EPANET were generated.

3.2 Generation of Synthetic Series and Determination of Search Ranges in the Optimization Model to Obtain the NSRP Model Parameters

After the observed moments for the households where the measurements had been obtained from field data, the statistical parameters {λ, μ, C, η, β} that are involved in the theoretical moments represented in Eq. 4 (objective function) were determined. These parameters were then introduced into the NSRP model.

The generation of the series was based on the public domain model found in the Rainfall Data Modeling Portal, RDMP (Mellor 2007). Because the generation of these series is a stochastic event, it is important to point out that a certain number of simulations have to be performed within the NSRP model, each of them using different random number generation seed. Finally, for verification purposes, the synthetic series obtained with the NSRP model were compared with the series obtained in the field.

An approximation of the search ranges of the mentioned parameters was performed using data reported by Feliciano (2005). Arrival rates (λ −1) ranged from 0.0689 min−1 (14.51 min) to 0.04305 min−1 (23.23 min) were used. These values yielded a search range of 1 min−1 (1 min) to 0.0404 min−1 (24.75 min). This broader range in the optimization model was desirable as it was observed that having a range less than or equal to 24.75 min resulted in the objective function solution drifting from zero. It is important to mention that the ranges reported by Feliciano (2005) correspond to an area adjacent to that of this study and that the time step used in Feliciano’s study was one second.

The next parameter to be adjusted was the mean pulse intensity (μ x ). The mean values of pulse intensity ranged from 1 to 8 L/min. However, such values “forced” the optimization scheme, resulting in very large and unrealistic demand values. After several trials, it was determined that the value had to be decreased from 8 L/min to 6 L/min in order to decrease the mean intensity and, as a result, decrease the pulse intensity in the synthetic series. This resulted in a more accurate approximation of the observed variance by the variance of the synthetic series (Table 3).

Table 3 Search ranges to obtain the NSRP model parameters

Table 4 shows the results from one of the households (the breakdown is omitted due to the length of the manuscript). The last row contains the value of the objective function Z defined by Eq. 4. Its value should be close to zero for the obtained solution to be good. In Table 4 it is very small (practically zero) for most of the hourly blocks, except for Mon–Fri 6–20 h and Sun 0–19 h. That means the solution was not good for these two hourly blocks. The main reason is that the mean water demand, and thus the water volume, consumed in these two hourly blocks is much higher than the mean water demand for the rest of the blocks, explained by some background leakage (Tzatchkov et al. 2005). Future work is needed to include such leakage in the proposed model.

Table 4 Results from one household obtained by the NSRP model

As previously mentioned, the calculated parameters were used to generate 50 synthetic series with data every minute, for each of the eight hourly blocks and for each of the 69 households. Thus, 27,600 synthetic series were created covering 1 week’s worth of water demand that represent demand patterns for the 69 households analyzed.

Each node of the model had a different number of assigned houses. Each house was assigned a stochastic pattern and a mean level of demand based on the number of houses at the node. The duration was set to 1 week, and the demand levels were assigned to the nodes as a function of the covered areas. The assigned demands were obtained from the 69 simulated demand patterns previously generated. It is important to mention that the synthetic patterns correspond to the demand of the households, and each simulated pattern corresponding to a particular household were input for the EPANET.

The 69 synthetic demand patterns were classified into three groups of socioeconomic levels and the households within each socioeconomic level were assigned with corresponding randomly selected demand patterns. Each pattern contains 10,080 pieces of data, which correspond to the demand with a one-minute time step and duration of 7 days.

Below are the results for this model at the same nodes and links that were evaluated with the deterministic model along with comparisons between the model results and the field measurements.

4 Comparison of Results from the Stochastic and Deterministic Models

Pressure and flow rate measurements were taken in the field at various nodes and pipes in the system where the deterministic and stochastic models were applied. Due to space constraints, a limited amount of data is presented in this paper. Figure 4 shows the comparison between the measured pressures and those obtained with both models at node 165 (see node location in Fig. 1). A daily fluctuation was observed due to the stochastic variation in the pattern of the synthetic series. Note that the measurements were better represented by the stochastic model in terms of behavior. The pressure variation is abrupt and high in the field measurements and in the stochastic model, contrary to the smooth pressure variation predicted by the HDVC model. Some of the values obtained by the stochastic model are much lower than the observed field values, however. Future work is needed to explain these differences. One possible explanation is that being an extended period (quasi-dynamic) model, the EPANET model is insufficient to represent the effect of the highly variable stochastic water demand, so that a truly dynamic model is needed. Such analysis is beyond the scope of this paper, however.

Fig. 4
figure 4

Comparison of the pressures recorded at node 165 with the values from the models

In the stochastic model, there were very low pressures at the analyzed node due to the simultaneous nature of the demand. After demand has stopped, the pressure increases.

The maximum and minimum pressures displayed in the stochastic scenario occur in periods of up to 1 min. This is the analysis time set in the simulation and stochastic patterns. It is worth explaining that the minimum pressures calculated using the stochastic model; in particular the low pressures, were obtained using the assumptions that are the basis of EPANET. That is, an analysis of extended periods during which the variations in flow rate and pressure are assumed to be slow, but such slow variations may be unrealistic in the case of stochastic demand, which varies abruptly. Therefore, the modeling should be carried out with a more refined dynamic model capable of representing abrupt variations in hydraulic variables. However, the discussion of such a model is beyond the scope of this paper.

We continued the analysis by reviewing the flow behavior in the same segment examined with the deterministic model (link 2597). Figure 5 shows the flow rate in this link (that supplies to a zone of the analyzed area with a 12-in. diameter pipe) where one-way flow rates were obtained. The behaviors of the flow rates and pressures were quite variable in this model. Sudden changes were caused by the random generation of demand patterns. This caused certain instants (of the order of minutes) to have high demand followed by a near zero demand on the next minute. The regulation tank and the pumping equipment absorb these variations.

Fig. 5
figure 5

Comparison of the measured flow rate with the HDVC and the stochastic models at link 2957

The flow rates present in the pipes of this model can be positive or negative over time, which indicates a change in the direction of the flow. The absence of flow represents periods of stagnation or periods of heightened residence time of water in the pipes.

The high variability of the observed flow rate was also more accurately predicted using the stochastic model, as it better reflected the pattern of demand by pulses (Fig. 5), but the results by stochastic model have much greater variation of flow rate. Similar to the pressure model results, perhaps the Epanet model employed is not capable of representing such abrupt variations in the flow rate and in future work a more refined (truly dynamic) model should be developed.

The study area consists of residential water users only. Other kinds of demand that have stochastic behavior can also be considered in the proposed methodology, e.g. commercial demand, although their parameters will be different and need to be obtained separately.

5 Conclusions

This article demonstrates the application of stochastic concepts to the modeling of residential water demand patterns. The NSRP model was applied to a hydraulic simulation model, which yielded results that resemble better the measured pressure and flow behavior (but not necessarily their values) for a drinking water distribution network, compared to the traditional HDVC approach.

The results of this work lay the foundations for a new, simple and practical tool for engineers and researchers who are dedicated to the design and maintenance of drinking water distributions systems. This model could be implemented by incorporating these methods into a module within commercial and public domain computer programs such as EPANET.

The HDVC approach is simple but inaccurate at local (residential) level. The stochastic model represents more realistically the water demand, but predicts excessive flow and pressure variations that should be explained in future work. Another advantage of the stochastic model is that it allows estimating leakage in the network. The HDVC includes physical losses, and when compared with direct measurements, it is possible to observe leakage when the HDVC and the mean flow rate are above the curve that represents the real level of user water demand.

Future work should also focus on automating the process of generating stochastic demand series and on establishing a Monte-Carlo model in the simulation process, as well as an analysis on the applicability of extended period models (such as the EPANET model, used in this paper) for modeling stochastic water distribution networks at different time and space scales.