Keywords

1 Introduction

Given a water distribution network and some forecasts about consumption at different nodes of the network over a specific time horizon, the problem addressed here is the scheduling of water pump jobs such that water demands are satisfied and conform to quality standards set by French and European legislation while minimizing operating costs (water treatment and electricity).

Among the approaches in the literature for solving this problem, we find linear programming (LP) [1], nonlinear programming [2], ant colony optimization [3], and genetic algorithms [4].

Each method has its own advantages and drawbacks, and a strategy may be more or less efficient, depending on the characteristics of any given network. The characteristics of the networks that concern us in particular are as follows:

  • The considered networks are very large (about 100 pumps and 50 storage tanks).

  • The computation time available for devising optimized solutions is relatively short (30 min maximum) because of the real-time scheduling constraint.

  • The discrete behavior of pumps needs to be expressed through the model. The pumping stations work at some defined level of pumping depending on the type and the power of the station and changes in the pumping regime can take place only periodically.

We chose to use LP for several reasons. First, our intention is to enhance an existing tool that is based on LP. Secondly, LP provides an appropriate modeling tool for most of the constraints encountered in drinking water distribution networks, including some related pressure constraints (known to be difficult to handle within an LP framework but that we have managed to express in a linear form). Finally, efficient solvers are already available.

The starting point is an existing software tool, Saphir [5], which already models the pump-scheduling problem with LP. Our aim is to enhance this tool so as to achieve the following:

  • Modeling, through the use of LP, new constraints that have not been expressed before.

  • Expressing in a linear form the hydraulic phenomena in the system.

  • Placing a limit on computation time while dealing with integer LP models of large dimensions.

The first difficulty is to model new elements of the network (such as pumps with variable flow) and to take into account the discrete behavior of pumps. This discrete behavior makes the optimization problem more difficult to handle: while the continuous problem can be solved with a polynomial algorithm, the mixed integer linear programming (MILP) problem is NP-hard and requires a branch and price procedure [6], but it represents the real-life operation of the pumps much better than a continuous approach. We therefore included in this formalization new constraints that have not been expressed before. The second difficulty is to respect or even reduce the limits placed on computation time, and to this end, we have looked at a number of different strategies.

In this chapter, we describe the previous existing formalization of a water network with MILP, we describe the new constraints, we provide some results, and we show how these results allow a better choice of parameters in the optimization.

2 Water Distribution System Modeling

In this section, we present the main elements of a water distribution network, and we list the associated structural constraints.

2.1 Elements

A water distribution system can be described by nodes and edges; each element in the network implies the creation of a set of constraints. The time horizon is divided into different time slots. The aim of the optimization is to determine whether each element, and in particular each pump, should operate or not operate within each time slot.

Let us consider the nodes and their associated constraints. We consider four different kinds of node:

  • Production points must respect the maximum flow production and the discrete behavior of the pumps. Some of these production points have a semi-discrete behavior, as their flow can vary between two fixed values. Each production point has only one outgoing pipe in our model.

  • Storage tanks must respect the maximum and minimum volumes of water they can contain. The level of each storage at the end of the day also has a required value, and the non-respect of this constraint penalizes the objective function (the penalty is commensurate with the difference observed).

  • Consumption points have only one input pipe in our model.

  • Connection nodes represent points where several pipes are connected.

Each node in the model needs to respect an additional constraint, namely continuity of flow. For every connection node and storage tank, the volume of water entering the node during a given time slot is equal to the outgoing volume (with an acceptable gap corresponding to the variation of volume inside the storage tank).

The volume produced during a time slot by a production element is equal to the volume of water carried by the outgoing pipe during this time slot. Finally, the volume required by a consumption point during a given period is equal to the volume carried by its input pipe.

Let us now consider the different pipes to be found in the network model:

  • Non-gravitational pipes have exactly the same constraints as for production points, because both are composed of pumps.

  • Gravitational pipes have a continuous behavior: They only need to respect the maximum volume they are able to carry during a given time slot.

Each element of the network has to respect this basic set of structural constraints, to ensure continuity of flow and the coherence of the model. A large network therefore implies a large number of variables and constraints to be taken into account, which is one of the difficulties of our problem. Because of the discrete behavior of the pumps, we have to employ integer variables, which means using integer linear programming (ILP) models. The larger the network, the larger the number of these integer variables, which seriously impacts computation time.

2.2 Mathematical Formulation

We now introduce the mathematical formulation for the main elements of the network.

  • For a production element p:

$$ q_{p,t} = f_{p,t} \times Q_{p,t} $$
(1)
$$ q_{p,t} = q_{k,t} $$
(2)

where parameter t represents the index of the considered time slot; element p is the index of the production node. Parameter Q p,t is the water volume that can be produced if element p operates during time slot t.

Element k is the index of the single output pipe from production node p.

Variable f p,t is a binary variable that represents the operation (f p,t  = 1) or non-operation state (f p,t  = 0) of element p. Variable q p,t is the volume of water actually produced by production node p during time slot t.

Variable q k,t is the volume of water carried by pipe k during time slot t.

Constraint 1 represents the flow of element p at each time slot.

Constraint 2 is the flow continuity constraint, which gives a flow value to element k.

Most of the time production elements have a discrete behavior as described above. But some production elements have a partial continuous behavior, which means that operation is discrete between 0 and a first flow value, and continuous between this value and the maximum flow. To take this behavior into account in the model, two variables f p,t are created and the equations rearranged.

$$ q_{p,t} = f_{p,t} \times Q_{p,t} + f_{p,t}^{\prime } \left( {Q_{p,t}^{\prime } - Q_{p,t} } \right) $$
(3)
$$ f_{p,t}^{\prime } \le f_{p,t} $$
(4)
$$ 0 \le f_{p}^{\prime } ,\;t \le 1 $$
(5)

where f p,t still represents the operation or non-operation of pump, Q p,t is the first acceptable flow value, Q p,t is the maximum flow, and f p,t is a continuous variable that allows the flow to fluctuate between the first flow value and the maximum flow.

  • For a consumption element c:

$$ q_{k,t} = q_{c,t} $$
(6)

where k is the index of the single input pipe into consumption point c, variable q k,t represents the volume transported by pipe k during time slot t, and variable q c,t represents the consumption demand of node c during time slot t.

  • For each connection node o, the flow continuity constraint can be expressed as follows:

$$ \sum\limits_{{k \in U_{0}^{ + } }} {q_{k,t} } = \sum\limits_{{k \in U_{0}^{ - } }} {q_{k,t} } $$
(7)

where set U + o is the set of successors of element o, and set U o is the set of predecessors of element o.

  • Finally, a very similar expression can be obtained for a storage point l:

$$ \sum\limits_{{\mathop {k \in U}\nolimits_{l}^{ + } }} {q_{k,t} } = \sum\limits_{{\mathop {k \in U}\nolimits_{l}^{ - } }} {q_{k,t} } + \Updelta (n_{l,t} ) $$
(8)

where variable Δ(n l,t ) represents the variation in the volume held in storage tank l during time slot t.

3 Operational Constraints

In addition to these structural constraints that describe the functioning network, a number of operational constraints have to be considered. We first provide an exhaustive list of the operational constraints to be taken into account and then focus on two of them in particular.

3.1 Operational Constraints

Constraints are more or less hard to express. The following are qualified as “simple” because they can be expressed through LP using conventional tools.

The constraints to be found already in the previous LP model are as follows:

  • The maximum or minimum volume to be pumped in a day for a given production element.

  • The maximum electricity power levels defined by the electricity supplier’s contractual obligations.

  • The list of elements that must work only in cases of emergency.

  • The maximum incremental rate for the flow in some elements of the network.

The constraints added in our new ILP model are as follows:

  • Simultaneity and exclusion constraints (pumps that cannot work at the same time or that are required to work at the same time).

  • A maximum inactivity period length for some pumps.

  • Ensuring a required water quality level through the mixing of water from different sources in some storage tanks (for instance by guaranteeing a percentage from different water sources).

  • Avoiding stagnation in storage tanks to ensure water quality.

  • Taking account of transfer delays when using very long pipes.

We now focus on the stagnation avoidance constraint and on a specific constraint that does not appear in the list.

To maintain a high quality of distributed water, stagnation has to be avoided. Most of the time water renewal occurs naturally, but in some areas with low consumption, a constraint needs to be added in relation to storage.

We denote as δ the maximum time water may remain in the tank. For each time slot t, we can write:

$$ n_{l,t} \le \sum\limits_{t}^{t + \delta } {\sum\limits_{{k \in U_{l}^{ + } }} {q_{k,t} } } $$
(9)

where n l,t represents the volume held in tank l at time slot t, and set U + l is the set of pipes leading out of tank l. In other words, the volume held in the tank at a given time slot may not exceed the outgoing volume over the following δ hours, in order to force a water turnover.

Because of the diversity of existing networks, highly specific constraints may sometimes be encountered. The constraint described below was encountered in only one network and required particular modeling. Two tanks are connected by both a gravitational and a non-gravitational pipe. The use of a gravitational pipe is possible only when the difference in levels is sufficiently large (Fig. 1).

Fig. 1
figure 1

Two tanks connected by a gravitational and a non-gravitational pipe

We expressed this constraint with ILP as follows:

$$ h_{t + 1} \ge m\left( {1 - f_{g,t} } \right) $$
(10)
$$ h_{t} \ge m\left( {1 - f_{g,t} } \right) $$
(11)
$$ 0 \le q_{g,t} \le f_{g,t} \times Q_{g,t}^{ + } $$
(12)

where variable h t represents the difference in levels between tanks S1 and S2 at the beginning of time slot t (h t+1 represents this difference at the beginning of the next time slot following t, that is to say t+1).

Parameter m is the minimum value possible for variable h t .

Parameter Q + g,t is the greatest possible volume that can be transported by gravitational pipe during time slot t.

As before, the binary variable f g,t stands for the operation or non-operation of gravitational pipe g during time slot t.

These equations represent the possibility or the impossibility of using gravitational pipe g; the same equations can be used to represent the availability of pipe g′, by changing variable h t .

If the difference in levels is negative at the beginning of the time step (represented by h t ) or at the end of the time step (represented by h t+1 ), then the variable f g,t is equal to zero. It is therefore impossible to use the gravitational pipe because the flow q g,t will be null.

3.2 Production-Related Constraints and Simplification

As explained above, expressing the problem through ILP increases computation time, which is a problem when seeking real-time solutions. To improve the solving process, we propose the aggregation of some pumps in a “super-pump” structure, in order to keep track of only the feasible flows. For example, having three identical pumping stations linked to the same connection node means that to obtain the same flow on this node, three pump combinations are possible, and so we have more variables than are needed to represent the same information, that is feasible flow. Therefore, under certain conditions, it will be more efficient to aggregate these three pumping stations into a single one.

The delicate aspects of this are as follows:

  • Deciding when pumps cannot be aggregated (for example, if a particular constraint affects only one of the pumping stations).

  • Transferring constraints from the previously separate pumps to the new aggregate one.

Conversely, in order to express certain constraints, it may be helpful to separate a pumping station with several allowed flows into different “simple pumps”.

For example, consider four pumps in the same pumping station. One of them can send water in two different directions, and only three pumps out of the four can work at the same time because of the maximum power consumption allowed by the electricity supply contract. To simplify, we separate the first pump into two (with an exclusion constraint meaning that they cannot operate during the same time slot), and express the constraints on the five pumps (Figs. 2, 3).

Fig. 2
figure 2

Initial real configuration of pumps

Fig. 3
figure 3

New configuration and constraints associated

Depending on the constraint difficulty, two strategies may be helpful:

Dividing a complex production element into “pump units” may be helpful if there is no other simple way to express particular constraints, but this means increasing the number of variables.

Where possible we aggregate pump units, in order to limit the number of variables and equations. For instance, situation involving only the pumps P1, P2, and P3 can led to a simple case of aggregation of pumps.

3.3 Hydraulic Phenomena

The hardness of modeling hydraulic phenomena is the first reason cited in the literature to disqualify the use of LP for the pump-scheduling problem. It was therefore especially important for us to show that hydraulic phenomena can be taken into account with a simplification precise enough for our needs. If the linearization of the hydraulic behavior of components is not possible, or unsatisfactory—as a result, for example, of an unacceptable increase in the number of constraints and the length of computation time—an alternative strategy may be to work with a hydraulic module during the optimization, in order to obtain the real flow values. This is already done in some applications such as the Derceto software [7].

A situation involving hydraulic phenomena arises when there are different gravitational pipes that bring water to or from a particular location, while the water distribution depends on pressure. The equations representing this situation are clearly nonlinear.

Another problem with pressure is water exchanges between storage tanks that supply the same consumption area. Given the difference in pressure between them their water levels tend to become balanced. To understand this phenomenon, we studied experimentally the behavior of two tanks supplying the same consumption area, with and without consumption.

The two storage tanks provide a certain amount of water to satisfy demand, and when consumption is low or null the water in the two of them will move toward the same level (Fig. 4). The equations above (obtained with Kirchhoff’s nodal rule and Kirchhoff’s mesh rule) represent the mathematical formalization of the flows Q 1, Q 2, and Q 3 between points A, B, and C (Fig. 5). We shall not explain the notation here, as our aim is simply to show the nonlinearity of the equations.

Fig. 4
figure 4

Parallel tanks supplying the same consumption area C

Fig. 5
figure 5

Hydraulic behavior equations

We attempted to model experimentally the movements using the hydraulic software tool EPANET. We found that the curve representing the exchanges between A and B when there is no demand can be linearized (Fig. 6).

Fig. 6
figure 6

Flow from storage B to A with respect to the difference in levels between B and A

We performed other experiments with nonzero consumption and for different pipe parameters, and we concluded that in the case of two parallel storage tanks, the different flows can be calculated from the experimental curves, based only on knowledge of consumption and water levels.

We generalized this approach to a slightly more complicated case with 4 tanks, 10 pipes, and 3 consumption points (Fig. 7).

Fig. 7
figure 7

New test network modeled with EPANET

In addition to the flow values in the pipes, we add a pressure value on the nodes. On each pipe, we add a linear equation of the following form, where f is a linear function:

$$ q_{k,t} = f\left( {h_{i,t} ,h_{f,t} ,H_{k} } \right) $$
(13)

The flow through a pipe q k depends on the pressure h i,t in its initial node, in the pressure h f,t in its final node, and on certain known parameters of the pipes H k (such as diameter and length).

For 19 different combinations of storage level values, we calculated the flow and pressure values on this small network, both with our linear model and with the hydraulic software tool EPANET (Table 1).

Table 1 Comparison in percentages between the results from EPANET and from our linear model

We know that our results cannot be exactly the same as those calculated from a real hydraulic model, owing to the simplifications we introduced. Regarding the margins of error calculated, we conclude that they are acceptable for our needs. Moreover, the largest differences in percentages correspond to only small differences in absolute values.

We now have to test this linear model on a real network. The most significant concern is the likely increase in computation time as a result of the new equations added. It may be possible to address this problem by reducing the number of integer time steps, as shown in Sect. 3.2.

4 Solution Strategies

The model as it is formalized can provide the network manager with a pumping schedule for the day. But the efficiency and the speed of the search for a solution can still be improved, as we see in this section. First, we discuss the formulation of the objective function. Then, we describe the optimization parameters that can help to improve the efficiency of the search for a solution.

4.1 Two-Stage Optimization Strategy

The objective function is formulated as follows:

$$ f = \sum\limits_{t = 1}^{H} {\sum\limits_{x} {C_{x,t} } f_{x,t} + \sum\limits_{t = 1}^{H} {\sum\limits_{y} {K_{y} } k_{y,t} } } $$
(14)

where parameter H is the time horizon (the number of time slots in the optimization). Parameter C x,t represents economic cost penalties, and variable f x,t represents the operation or non-operation of the corresponding element x during time slot t. Parameter K y,t is the penalty for not respecting constraint y; variable k y,t is different from 0 only if constraint y is not respected at time slot t.

One problem of this objective function is its heterogeneity. Penalties C x,t represent real economic costs, while penalties K y are set arbitrarily to enforce the respect of some constraints if possible.

Introducing these two goals into the objective function (to ensure that operational constraints are met as far as possible while minimizing the economic cost) is not the most efficient way to proceed. It might even be counterproductive, in that it can lead to aberrations if the arbitrary penalty is not well chosen (for example, failure to respect a particular constraint, because this is a less expensive option, whereas optimization of economic cost should only begin once all operational constraints have been met).

A two-stage optimization might be an answer to this problem. First, we optimize taking into account only the constraint penalties. At the end of this step, we are able to set the maxim penalties allowed for the non-respect of each constraint. For example, we know the smallest deficit value for the storage tanks that will still allow us to reach a feasible solution.

Once these new constraints have been established, we can start to optimize in the second step, taking into account only the economic penalties.

We tested this approach with a time limit of 1,800 s (30 min). We compared a direct optimization with a two-stage resolution. After a few trials, we decided to devote the largest amount of available computation time to the first stage (1,790 s) and only 10 s to the second stage (Fig. 8).

Fig. 8
figure 8

Comparison between economic costs of a direct or two-stage optimization

The cost of the objective function is always less for direct optimization, but most of the time (except for five instances out of twelve) two-stage optimization better respects the constraints of the problem.

Our test showed us that the most difficult part of the optimization problem was emphatically not the economic aspect, but the respect of all constraints (as stated above, we had to set the first step of the optimization to 1,790 s, and the second step to only 10 s). Two-stage optimization is therefore relevant to our problem, but might be refined to obtain better results. The values of the penalties, in particular, have to be chosen very carefully.

4.2 Partially Continuous Relaxation Strategy

Another idea for reducing computation time is to reduce the number of integer time steps. As we are periodically recalculating the pump scheduling because consumption predictions are constantly being updated, it is possible to use integer time steps during an initial phase only, before switching to continuous time steps for the remainder of the solution process.

The correct schedule obtained in this fashion for the first time steps can be implemented directly. For subsequent time steps, we will have continuous values (somewhere between 0 and 1) for the pump operation. It will not be possible to use these values directly, but they will give us an idea of the volumes to be exchanged.

We tested the computation time for 20, 12, and 4 initial integer time steps, followed by continuous time steps (Table 2, Fig. 9).

Table 2 Gap between best bound and best feasible solutions for different numbers of integer time steps
Fig. 9
figure 9

Comparison of gap for 20 and 24 integer time steps

Reducing the number of integer time steps considerably increases the efficiency of the solution process. For the test with only four integer time steps, we obtained an optimal solution in less than 10 s for each instance.

But we have to determine the smallest number of time steps necessary to guarantee the quality of the solution.

4.3 Enlarging the Horizon Time

Another problem is planning for the final time slots of the day. Some constraints are hard to take into account unless we have some knowledge concerning the following periods. For example, in the case of the constraint designed to avoid stagnation in the storage tanks, we need to know if consumption will be low or high just after the end of the optimization.

To overcome this difficulty, we planned to extend the optimization horizon: If the requested planning was over 24 h, the optimization would be over a period of 48 h so as to avoid the problem of non-visibility affecting the final periods. But extending the optimization horizon increases the computation time, which is a serious drawback in a real-time context. We therefore retained the principle while slightly modifying the model for the following 24 h in order to have no more than an overall vision of the volume and constraints present the following day, thus avoiding a large increase in computation time.

We tested the differences between a 24-hour and a 48-hour optimization. A computation time limit of 30 min was specified in advance, and we compared the final discrepancy between the feasible solution and the best bound found. The storage tanks were required to be full at the end of the optimization, and the first time slot was fixed at 6 a.m. The different instances tested had different initial water levels in the tanks and the tests concerned different periods of the year or week, which impacts the level of consumption and certain energy-related constraints (price and maximum authorized power) (Tables 3, 4).

Table 3 24-hour optimization
Table 4 48-hour optimization

The first column indicates the instance tested: the initial level in the tanks, the season, the consumption prediction used for the first 24 h, and the consumption prediction for the latest 24 h (because the consumption is different during the week and during the week end). There are more instances for the 48-hour optimization because there are a larger number of possibilities when we are dealing with two separate days. The three other columns respectively show the final feasible solution obtained, the best bound, and the difference between these two values. These results are presented in the following graphic (Fig. 10):

Fig. 10
figure 10

Comparison of gap for a 24-hour and a 48-hour optimizations concerning 18 different instances

We can see that the 48-hour optimization converges faster on the best bound than the 24-hour solution. This is encouraging and will enable us to reduce the 30-min limit on computation time.

5 Conclusions

The aim of this study is to build a realistic mathematical model giving rise to pertinent, feasible solutions, where computation time has been reduced as much as possible and the solutions are of a high quality.

The use of integer variables in our model takes into account the discrete behavior of pumps, and the solutions obtained are therefore easy to implement. We also take new constraints into account. The next important step for us will be to increase the relevance of our model by taking fully into account a number of hydraulic phenomena that can be observed in the network.

We succeeded in reducing computation time and the discrepancy between the best bound and feasible solutions by changing some parameters in our optimization. Our intention is now to refine our optimization techniques and to introduce new ways of reasoning with the aim of reducing the solution space and the computation time. We also plan to look at other optimization tools (such as constraint programming) that may be able to solve the same model more efficiently.