1 Introduction

It is essential to reduce the carbon footprint to save non-renewable energy resources and protect our homeland. About 35 % of the water from the water distribution system (WDS) is wasted in most countries (Pasha and Lansey 2014; Zhang et al. 2016). Abnormal condition, such as the pipe burst, water hammer, causes terrible wastefulness in unreliable WDSs. Therefore, water resources management is now faced with a powerful challenge about reducing loss, and improving reliability.

Network partition is an ideal way to improve the quality, efficiency, and safety of the water supply. District Metered Area (DMA) model was presented in 1989, where WDS is divided into different partitions via opening or closing pressure-reducing valves (PRV) (Ciaponi et al. 2016; Giudicianni et al. 2020). Leakage can be reduced only if PRVs are operated through an effective plan (Ulanicki et al. 2003). With the development of sensor equipment, massive data collection becomes easier. Data mining has been applied as a basis of network partition for improving network reliability in WDS (Wu and Rahman 2017). In recent years, some researchers employed the k-means cluster and support vector machine model to partition the WDS, which optimized the locations of pressure monitoring sensors to better detect leakage (Zhang et al. 2016). Fuzzy c-means algorithm was used to zone hydrologic homogenous regions for the flood management (Kumar et al. 2019; Dehghanian et al. 2019). However, these method needs to reposition some sensors’ locations. The multi-scale random walk community detection (RWCD) was proposed to process clustering analysis without sensors reposition, which is a heuristic greedy model with modularity optimization (Lambiotte et al. 2014). But it is difficult to select the number of partitions unless optimization algorithms are employed for searching optimized solutions. Genetic algorithm (GA) as a single objective optimization algorithm, has successfully applied in the optimization of energy consumption (Savic et al. 1997). An extended reduced dynamic programming algorithm similar to GA was proposed to optimize energy consumption and maintenance cost in WDS (Zhuan and Xia 2013). Other algorithms such as the cuckoo search (CS) (Yang 2014), CS-GA hybrid algorithm (Ghodrati and Lotfi 2012) can also be used for optimizing WDSs. However, these above algorithms have a fatal weakness that cannot process multiple objective functions’ optimization. Non-dominated sorting genetic algorithm (NSGA) is a multiple objective optimization algorithm (Deb et al. 2002), which utilizes a non-dominated sort model to solve multiple objective problems Non-dominated sorting genetic algorithm-II similar to NSGA but an elitist strategy called crowded-comparison is presented, which improves the effect on variety selection (Srinivas and Deb 1994). The network assessment and leakage detection are feasible ways for accessing network reliability which have been successfully implemented by NSGA-II (Gupta and Kulat 2018). Therefore, NSGA-II is a high potential method for the optimization of the network.

Furthermore, a large-scale hydraulic calculation requires a lot of time and computing resource consumption (Zhang and Shen 2007; Mounce et al. 2017). EPANET software is a calculator for hydraulic calculations, which is programmed by the American Environmental Protection Agency (Rossman 2000), which can fast solve the hydraulic status via the extended period simulation (EPS), and also conveniently construct network partitions via opening or closing valves (Ostfeld et al. 2008). EPANET can be re-developed with other platforms such as MATLAB, C program, and JAVA. The Open Water Analytics (OWA) toolbox is a re-developed program using MATLAB platform (McDonnell et al. 2007), which creates a time-saving and programming convenient way for EPS by matrix operations.

For acquiring an ideal solution of a network partition, this study proposes a novel partitioning model. RWCD is first used to divide the WDS into different partitions. Then, WDS reliability is analyzed concerning five functions. Then, the NSGA-II optimization algorithm is applied to access network reliability by controlling boundary pipes via PRVs. In case studies, seven WDSs were selected to verify the practicability of the method, and the results were presented. Five advantages of this model are following as: (1) the number of partitions can be automatically selected by NSGA-II without manual filters; (2) The average pressures of nodes are used as RWCD’s variables that are suitable for representing hydraulic conditions; (3) Clear boundaries of partitions are shown by setting PRVs; (4) The results of optimization can cover network reliability in different abnormal situations. (5) The OWA software improves the calculating speed and readability of programs.

2 Network Partition

RWCD is used to divide WDS into different partitions for the first time in this study. This model was first presented by (Lambiotte et al. 2014) and references the community detection (CD) model (Delvenne et al. 2009, 2013). The CD employs modularity (Fortunato 2010) as an indicator to evaluate the clusters of undirected graph nodes. The most common modularity is the Girvan-Newman (GN) quality function. The equation can be simply expressed as follows:

$$ {Q}_{GN}={\sum}_{k=1}^K\left[\frac{E_k^i}{2m}-{\left(\frac{m_k}{2m}\right)}^2\right] $$
(1)

Where K is the number of clusters or called partitions for WDS; k is the kth partition; \({E}_{k}^{i}\) is ith links at the kth partition; mk is the total number of links at kth partition; m is total number of links.

Then, the GN quality function can be expressed by water demands formalism (Girvan and Newman 2002) for programming convenience.

$$ {Q}_{GN}=\frac{1}{2m}\sum \limits_{ij}\left({A}_{ij}-\frac{d_i{d}_j}{2m}\right)\delta \left(i,j\right) $$
(2)

Where Aij is the adjacent weight between link i and j and Aij is the weight of the link. if link i connects to link j, Aij = 1 or Aij = 0; di and dj is the number of links connecting to the node i and j, respectively; \(\delta (i,j)\) is the Kronecker delta function.

$$ \delta \left(i,j\right)=\left\{\begin{array}{c}0\kern0.5em \left(i\ and\ j\ belong\ to\ the\ same\ partition\right)\\ {}\ 1\kern0.5em \left(i\ and\ j\ belong\ to\ the\ different\ partition s\right)\end{array}\right. $$
(3)

For convenience, the WDS can be abstracted as an undirected graph. The links are regarded as the edges E={e1,e2,…,em} and the nodes are vertexes V={v1,v2,…,vn}. And m is the total number of links the same above. Similarly, n is defined as the total number of nodes. The connectivity of WDS can be described as an adjacent matrix A:

$$ A=\left(\begin{array}{ccc}{A}_{11}& \cdots & {A}_{1n}\\ {}\vdots & \ddots & \vdots \\ {}{A}_{n1}& \cdots & {A}_{nn}\end{array}\right) $$
(4)

The sum of jth rows is the total number of links attaching to the node j, and can be expressed by dj. Thus, the total number of links at each node can be defined D={d1,d2,…dn}T. In real WDS, reservoirs and tanks should be regarded as nodes, same as junctions, while valves, pumps are regarded as links like pipes. The combinatorial graph Laplacian can be expressed by L = D - A or L = D− 1/2LD1/2 (L is equal or greater than zero).

The random walk is a standard dynamic process at a discrete-time extending on the Markov stability formalism (Lambiotte et al. 2014) that can be defined as:

$$ {p}_{t+1}={p}_t\left[{D}^{-1}A\right]={p}_tM $$
(5)

Where p is a 1×n vectors of the probability representing the dynamics converge extent of an undirected graph; p will gradually converge to a stationary distribution solution π (\(\pi ={d}^{T}/2m\)); M is a transition matrix at the time t.

The discrete-time form can be as a derivative expression:

$$ \frac{dp}{dt}=-p\left[{D}^{-1}L\right] $$
(6)

Then, the WDS will be decomposed into different partitions. Each partition is encoded by the n×c indicator matrix H. c is the total number partitions; H is an index matrix (for example, Hij is the adjacent value between link i and j and Hij = 1 if link i connects to link j, or Hij = 0). If a matrix H is set, a clustered auto-covariance matrix of the diffusion process that evaluates WDS at time t can be expressed (Lambiotte et al. 2014):

$$ {R}_t={H}^T\Big[\mathit{\operatorname{diag}}\left(\pi \Big){M}^t-{\pi}^T\pi \right]H $$
(7)

Where Rt is a c×c matrix that reflects the probability of the random walk to divide different partitions for each node; Mt is the transition matrix of time step t, which can be expressed as:

$$ M={D}^{-1}A=-{D}^{-1} Lt $$
(8)

Provide that the minimum of Rt is at the time s, the Markov stability of the results for partitions is (Martelot and Hankin 2012):

$$ {r}_t={\min}_{0\le s\le T} Tr\left[{R}_s\right] $$
(9)

Where Tr is the trace of matrix; rt is the minimum Rt; It is notice that the r1 (t = 1) is the same as trace of Eq. 2:

$$ {r}_1= Tr\left[{H}^T\left(\frac{A}{2m}-{\pi}^T\pi \right)H\right]= Tr{Q}_{GN} $$
(10)

Thus, the discrete-time t = 1 is a particular form of GN modularity.

For converging to the same stationary point as the discrete-time simple random walk, an improved GN modularity can be generated according to the random walk method (Lambiotte et al. 2014):

$$ {r}_t= Tr\left\{{H}^T\left[\mathit{\operatorname{diag}}\left(\pi \right){e}^{-{tD}^{-1}L}-{\pi}^T\pi \right]H\right\} $$
(11)

Where t is the time of random walk.

For fitting this modularity to the WDS, each value vi at the vector V should be redefined as the pressure of the nodes. The matrix A is the average pressure of two nodes:

$$ {\mathrm{A}}_{\mathrm{ij}}=\frac{{\overline{P}}_i+{\overline{P}}_j}{2} $$
(12)

Where \({\stackrel{-}{P}}_{i}\) and \({\stackrel{-}{P}}_{j}\) are average pressures at the node i and j, respectively, which can be solve as:

$$ {\overline{P}}_i={\sum}_{t=1}^T\frac{P_i}{T} $$
(13)

Where T is the total EPS simulation duration.

The total number of nodes d can be revised as the sum of average pressures defined a={a1,a2,…an}T. ai is the sum of ith column of matrix A.

Thus, the improved GN modularity for WDS is modified as:

$$ {r}_t= Tr\left\{{H}^T\left[\phi {e}^{-{t\psi}^{-1}Y}-{\eta}^T\eta \right]H\right\}= Tr\left[{H}^T fH\right] $$
(14)

Where \(\eta=a^T/2M\); M is equal to a half of the sum of all rows and columns of matrix A. ψ is the diagonal matrix of E; Y is a matrix and equal to ψ-A; φ is the diagonal matrixη; f is equal to the \(diag\left(\pi \right){e}^{-{t\psi }^{-1}Y}-{\eta }^{T}\eta\).

H is a unit matrix at the start time. The index matrix assumed node 1 is assigned from partition 1 to partition 2. Then, the elements H11 changes from 1 to 0 while H12 is changing from 0 to 1. Analogizing that, node i belongs to partition j, and partition k is a neighboring partition. Then, the elements Hij changes from 1 to 0 while Hik is changing from 0 to 1 when node i was assigned to partition k.

The variation of modularity can express as follows:

$$ \Delta {\mathrm{r}}_{\mathrm{i}}\left(j\to k\right)={r}_i(k)-{r}_i(j) $$
(15)

Where ri(j), ri(k) are the modularity when the node i belongs to the partition j and partition k, respectively.

For a better understanding of the RWCD (Blondel et al. 2008), a flow chart of this algorithm is illustrated in Fig. S1 in Supplementary Information.

3 Network Reliability Optimization

WDS has been divided into a different number of partitions after the network partition process with respect to the length of iteration times. However, it is unknown which the best partition is, and the number of boundary pipes to be opened or closed. It is necessary to select a partition plan and explore whether the boundary pipes between partitions are opened or closed. Therefore, NSGA-II is re-developed as a multi-objective function optimization method for solving this problem. In this analysis, five objective functions are employed to optimize WDS for balancing pressure and reducing leakage as below.

3.1 Mechanical Reliability Estimation

The mechanical failure has a severe influence on WDS, including the pipe burst, pump damage, and valve out of control, which will consume a huge carbon footprint to offset these losses. The evaluation of mechanical reliability becomes crying needs.

The pipe burst is the most severe accident in WDS. Mechanical reliability estimation (MRE) is presented by (Su et al. 1987; Sirsant and Reddy 2020), which provides a solution to evaluate the stability of pipes. MRE can calculate pipes break based on the different DMA partitions as the various zones have different probabilities of breaks.

The pipe burst is relative to the lengths and diameters of a pipe. The probability of the pipe break is assumed as:

$$ {p}_i=1-{e}^{-{\beta}_i} $$
(16)

Where pi is the break probability of pipe i; βi is the expected number of failures of pipe i which can be solved as:

$$ {\beta}_i={r}_i{L}_i $$
(17)

Where ri is the per year per unit length expected number of failures of pipe i which can be expressed as Eq. 18; Li is the length of pipe i.

$$ {r}_i=\frac{0.68}{d_i^{3.26}}+\frac{2.72}{d_i^{1.31}}+\frac{2.77}{d_i^{3.58}}+0.04 $$
(18)

Where di is diameter of pipe i.

Finally, the MRE equation can be solved as:

$$ F=\frac{1}{N_p}{\sum}_{k=1}^{N_p}{p}_k\frac{D-{S}_k}{D} $$
(19)

Where F is probability of pipe break of the integrated WDS; Np is the number of partitions of the WDS; pk is the break probability of the partition k expressing as Eq. 20; D is the total water demand of WDS; Sk is the total demand of partition k.

$$ {p}_k={\prod}_{i=1}^{N_{knode}}{p}_i $$
(20)

Where Nknode is the number of nodes at the partition k.

3.2 Hydraulic Reliability Estimation

The hydraulic reliability estimation (HRE) method is employed as the second objective function, which is proposed by (Gupta and Bhave 1994). In this method, the water demands of nodes are assumed as unknown, independent values. Then, random water demands are distributed to the nodes of WDS in a normal distribution. For improving the stability, some coefficients are appended to the water demands. Five different cases, 0.1, 0.2, 0.4, 0.6, 0.8, and 1.0 are considered as factors, which relate to the water requirement of users (Sirsant and Reddy 2020).

The hydraulic calculating is simulated after the water demands are allocated to nodes. Then, node pressures are obtained. The HRE can be solved by:

$$ {R}_h={R}_v{F}_t{F}_n $$
(21)

Where Rh is HRE factor; Rv is volume reliability factor as Eq. 22; Ft is the time factor as Eq. 24; Fn is the node factor as Eq. 25.

$$ {R}_v=\frac{\sum_i{\sum}_j{q}_{ij}^r{t}_i}{\sum_i{\sum}_j{q}_{ij}^b{t}_i} $$
(22)

Where \({q}_{ij}^{r}\) is the real water demand of node j at the ith time as Eq. 23; \({q}_{ij}^{b}\) is the base water demand of node j at the ith time; ti is a period of EPS during ith time.

The real water demand is calculated as:

$$ {q}_{ij}^r=\left\{\begin{array}{cc}{q}_{ij}^b& {H}_{ij}^r\ge {H}^d\\ {}{q}_{ij}^b\left(\frac{H_{ij}^r-{H}^m}{H^d-{H}^m}\right)& {H}^m\le {H}_{ij}^r\le {H}^d\\ {}0& {H}^r<{H}^m\end{array}\right. $$
(23)

Where \({H}_{ij}^{r}\) is the real pressure of node j at the ith time; Hm is the minimum require pressure; Hd is the desired pressure.

The time factor Ft can be expressed as:

$$ {F}_t=\frac{\sum_i{\sum}_j{\delta}_{ij}{t}_i}{N_{node}T} $$
(24)

Where \({\delta }_{ij}\) is a coefficient. If \({q}_{ij}^{r}\ge {q}_{ij}^{b}\), \({\delta }_{ij}\) is one otherwise is zero; Nnode is the number of nodes; T is the total time during the EPS.

Fn can be expressed as the function of node reliability:

$$ {F}_n={\left({\prod}_{j=1}^{N_{node}}{R}_{nj}\right)}^{\frac{1}{N_{node}}} $$
(25)

Where \(\prod {R}_{nj}\) is the product of all the node reliability.

Fn is the node factor, which represents the average of the node reliability Rn. The node reliability of node j can be solved as:

$$ {R}_{nj}=\frac{\sum_i{q}_{ij}^r{t}_i}{\sum_i{q}_{ij}^b{t}_i} $$
(26)

Where Rnj is the node j reliability factor during the EPS.

3.3 Flow Entropy Function

Flow Entropy Function (FEF) was firstly presented by (Shannon 1948), and applied in the WDS as an evaluation index for accessing the stability of fluctuating water demands (Tanyimboh and Templeman 1993; Sirsant and Reddy 2020). FEF is employed as the third objective function for preventing water hammer. The resiliency index can be calculated using the following equation:

$$ S={S}_s+{\sum}_j^{N_{node}}{P}_j{S}_j $$
(27)

Where Ss is the entropy of water supply sources containing reservoirs and tanks, which can be solved by Eq. 28; Pj is the fraction of inflow the node j and the total water demand as Eq. 30; Sj is the entropy of junctions as Eq. 31.

$$ {S}_s=-{\sum}_i^{N_s}{\sum}_j^{N_{sp}}{p}_{ij}{\ln}^{p_{ij}} $$
(28)

Where Ns is the number of water supply sources; Nsp is the number of nodes connected with water supply sources; pij is the fraction of pipe ij outflow and total outflow from node i to node j, expressing as:

$$ {p}_{ij}=\frac{q_{ij}}{Q_{oi}} $$
(29)

Where qij is the external outflow of from node i to node j; Qoi is the total outflow of node i.

The fraction of inflow can be expressed as:

$$ {P}_i=\frac{Q_i}{Q} $$
(30)

Where Qi is the total internal inflow of node i; Q is the total water demand.

The entropy of junctions can be expressed as:

$$ {S}_j=-{\sum}_i^{N_{node}}{\sum}_{j\left(j\ne i\right)}^{N_{node}}{p}_{ij}{\ln}^{p_{ij}} $$
(31)

3.4 Network Resilience

Network resilience (NR) is proposed by (Todini 2000; Prasad and Park 2004), which reflects the internal energy dissipation. Huge power dissipation may be caused when pipe diameters change. NR is employed as the fourth objective function to avoid power loss of WDS. NR can be solved as:

$$ {I}_n=\frac{\sum_{i=1}^{N_{node}}{U}_i{q}_{\mathit{\min},i}\left({H}_i-{H}_{\mathit{\min},i}\right)}{\sum_{i=1}^{N_s}{q}_{R,k}{H}_{R,k}-{\sum}_{i=1}^{N_{node}}{q}_{\mathit{\min},i}{H}_{\mathit{\min},i}} $$
(32)

Where Ui is the uniformity factor of node I, expressing as Eq. 33; qmin,i is the minimum water demand at node i during the EPS period; Hmin,i is the minimum pressure at node i during the EPS period; qR,k is the outflow of reservoir k; HR,k is the pressure of reservoir k;

$$ {U}_i=\frac{\sum_{j=1}^{N_i}{d}_j}{N_i\max \left\{{d}_1,\dots, {d}_{N_i}\right\}} $$
(33)

Where dj is the diameter of link j; Ni is the number of pipes connected with node i.

3.5 Connectivity Optimization

For better water resources management, the number of boundary pipes between two partitions needs to be minimized.

Thus, the number of boundary pipes between two partitions is selected as the fifth objective function.

$$ f={\sum}_{i=1}^{N_p}{\sum}_{j=1}^{N_{jnode}}{s}_{ij} $$
(34)

Where sij is the boundary pipe j at ith partition. When the valve is open, sij is one, otherwise zero; Njnode is the sum of boundary pipes at the partition i; f is the sum of opened boundary pipes.

sij is divided into l groups for analysis. For example, with ten boundary pipes and four partitions and each partition has three boundary pipes in WDS, if these pipes status are opened or closed, it can be written in a binary style: [1,1,1,0,1,1,0,1,1,1,0,1]. The first three elements are the first partition boundary pipes which valves in this pipe are all opened. The next three elements are second partition boundary pipes, in which the first valve is closed. The rest can be explained in the same manner. Also, boundary pipes are sort of by their index.

3.6 Non-dominated Sorting Genetic Algorithm-II

NSGA-II is a multiple objective function optimization model utilizing non-dominated sort search method.

In the WDS, the equation of continuity and the conservation of energy should be obeyed. Two constraints need to be considered:

$$ {H}_{min}^{\mathrm{i}}\le {\mathrm{H}}^{\mathrm{i}}\le {H}_{\mathrm{max}}^i $$
(35)
$$ {L}_{min}^i\le {L}^i\le {L}_{max}^i $$
(36)

Where \({H}_{\text{m}\text{a}\text{x}}^{i}\) is the maximum of ith node pressures during the EPS periods; \({H}_{min}^{i}\) is the minimum of ith node pressures; \({L}_{min}^{i}\) and \({L}_{max}^{i}\) are the minimum and maximum of ith tank water levels, respectively.

NSGA-II is employed to optimize five objectives above, and some revised for this algorithm is as follows:

All nodes belong to same partition are imported into a matrix to facilitate management. Secondly, all pipes is expressed as both ends nodes index. For example, Lij is the pipe that ends nodes are i and j. Find the boundary pipes between two partitions by identification of the pipe’s ends nodes if belong to different partitions. Next, initial variables are set randomly as the connected states of pipes (opened or closed). For instance, if the number of partitions is 3, and boundary pipes are 5, the connected states matrix is:

$$ \mathrm{X}=\left[\begin{array}{ccc}{s}_{11}& \cdots & {s}_{15}\\ {}\vdots & \ddots & \vdots \\ {}{s}_{31}& \cdots & {s}_{35}\end{array}\right] $$
(37)

Where sij is the status of pipe j for population (partition) i.

The objective function can be expressed by:

$$ \min\ \left\{{f}_1,{f}_2,{f}_3,{f}_4,{f}_5\right\}=f\left(\boldsymbol{x}\right) $$
(38)

Where f1 is the reciprocal of MRE as Eq. 19; f2 is the reciprocal of HRE as Eq. 21; f3 is FEF as Eq. 27; f4 is the reciprocal of NR as Eq. 32; f5 is the connectivity function as Eq. 34. x is an independent vector containing water demands of nodes, partition numbers, nodes pressures, flows of pipes, and boundary pipes status.

Using Eq. 37 to solve Eq. 38, if the results meet the requirement of Eqs. 3536, this results will be retained, otherwise function values are set to infinity.

Then, non-dominated sort was processed (Srinivas and Deb 1994) after all populations have been solved. The five objective functions generates a matrix F, which can be defined as:

$$ F=\left[\begin{array}{ccc}{f}_1^1& \cdots & {f}_5^1\\ {}\vdots & \ddots & \vdots \\ {}{f}_1^3& \cdots & {f}_5^3\end{array}\right] $$
(39)

Where \({f}_{i}^{j}\) is the value of the objective function fi in the population of j.

Then, the domination count initial as np=0 and search one of the best population and set it to F1. The rank of F1 is set to one. Next, traverse F1 and find the domination solutions Si for each i. Traverse each Si all the individual nl and then execute np=np -1. If nl = 0, save it into a set of Q. Then, F2 = Q, and repeat this process until Fn is a null set.

For instance, assign values for Eq. 39:

$$ \mathrm{F}=\left[\begin{array}{c}3\\ {}4\\ {}5\end{array}\kern0.5em \begin{array}{c}4\\ {}5\\ {}5\end{array}\kern0.5em \begin{array}{c}6\\ {}7\\ {}8\end{array}\kern1.25em \begin{array}{c}6\\ {}8\\ {}9\end{array}\kern0.5em \begin{array}{c}5\\ {}9\\ {}10\end{array}\right]{\displaystyle \begin{array}{c}\left\{1\right\}\\ {}\left\{2\right\}\\ {}\left\{3\right\}\end{array}} $$
(40)

All objective function values in {1} are less than {2}, {3}. The n1 is equal to 0 and (2), (3) are combined to the S1=[{2},{3}] and F1={1}.

Then, mark each element of {1} to rank (1) Similarly, {2} is less than {3} but larger than {1}, so n2 is 1 and S2 = {3}. The {3} is larger than {1}, {2}, so n3 is equal to (2) F1 is not a null set, so traverse F1. S1 = [{2},{3}] and n2-1 = 1–1 = 0, so Q={2}. Mark element of {2} to rank 2, but the n3-1 = 2 − 1 = 1, reject it. F2 = Q and traverse F2. S2 = {3}, n3 − 1–1 = 2–1–1 = 0, so Q={3} and mark element of {2} to rank (3) F3 = Q and traverse F3. Then, S3 is a null set, so Q is a null set. F4 = Q is null. {1}, {2} and {3} are ranked as 1, 2, and 3.

After populations ranking, the crowding distance should be assigned to the whole population. For each population, the values should be sorted in rank, allowing calculation of the crowding distance for the population at each rank and in descending order with distance. The crowding distance equation is as follows:

$$ {n}_d(i)={n}_d(i)+\left[{f}_m\left(i+1\right)-{f}_m\left(i-1\right)\right] $$
(41)

Where nd is the crowding distance; fm is the objection function m.

Next, the new rank and distance should be calculated, for each population position to be updated, using copy, cross, and mutation method like simple genetic algorithm.

Then, combine half of the old population with half of the new population, which are all half of higher sort elements.

For instance: assuming the old population is:

$$ \mathrm{F}=\left[\begin{array}{c}3\\ {}4\end{array}\kern0.5em \begin{array}{c}4\\ {}5\end{array}\kern0.5em \begin{array}{c}6\\ {}7\end{array}\kern1.25em \begin{array}{cc}\begin{array}{c}6\\ {}8\end{array}& \begin{array}{c}5\\ {}9\end{array}\end{array}\right] $$

Where the first of row is rank 1 and second is rank 2.

Assuming the new population is:

$$ \mathrm{F}=\left[\begin{array}{c}3\\ {}6\end{array}\kern0.5em \begin{array}{c}5\\ {}8\end{array}\kern0.5em \begin{array}{c}5\\ {}9\end{array}\kern1.25em \begin{array}{cc}\begin{array}{c}6\\ {}8\end{array}& \begin{array}{c}5\\ {}9\end{array}\end{array}\right] $$

Where the first of row is rank 1 and second is rank 2.

Then, combine first of row of old population and first of row of new population and reject others. A new combination can be produced:

$$ \mathrm{F}=\left[\begin{array}{c}3\\ {}3\end{array}\kern0.5em \begin{array}{c}5\\ {}4\end{array}\kern0.5em \begin{array}{c}5\\ {}6\end{array}\kern1.25em \begin{array}{cc}\begin{array}{c}6\\ {}6\end{array}& \begin{array}{c}5\\ {}5\end{array}\end{array}\right] $$

Finally, non-dominated sort will be continued processing with new population until reaching the maximum iteration times.

The process of optimization is illustrated in Fig. S2. In the RWCD process, the number of partitions is equal to the number of iterations. Thus, a huge number of partitions will be obtained. Then, some partition plans that exist nodes of disconnected regions or repeated plans should be removed. The depth-first search algorithm is employed to judge the connectivity of all nodes at each partition (Rodrigues et al. 2019). Then, all selected partitions plans are inputted to the NSGA-II model, using five objective functions to optimize. Finally, the best partition plan can be selected manually with network reliability.

4 Example of the Network Partition With Net1

An example is employed called Net1 (Hwang and Lansey 2017), which has 9 junctions, 1 reservoir, 1 tank, 12 pipes, and 1 pump. The average water demand is 249.81 m3/h. The network topology is shown in Fig. S3.

Eleven nodes in Net1 are firstly distributed to 11 partitions shown in Fig. S4 (a). The average pressures of nodes are solved by EPS, illustrated in Table S1. Then, each weight Aij is solved by Eq. 12 if two nodes are connected, or is zero, illustrated in Fig. S4 (b). There are 13 weights of non-zeros including A12, A1,10, A23, A25, A34, A36, A3,11, A47, A56, A58, A67, A69, A89 and the matrix is a symmetric matrix (Aij=Aji). Thus, 26 elements are non-zeros. The sum of all rows and columns of matrix A is 2 M = 2066.96. M is equal to 1033.48. The vector a is equal to the sum of each row of A matrix.

$$ \mathbf{a}={\left[128.66,253.38,\mathrm{316.52,171.53},252.27,338.62,172.51,64.69,163.64,43.45,61.92\right]}^{\mathrm{T}} $$

The η solves by \({a}^{T}/2 M\), η=[0.062, 0.123, 0.153, 0.083, 0.122, 0.164, 0.083, 0.080, 0.079, 0.021, 0.030]T. ψ and φ are diagonal matrix a andη, which are obtained via replacing the elements to the diagonal lines of matrix.

The matrix H is initialized and generates an 11*11 unit matrix. The GN modularity matrix is solved by Eq. 14 and results are illustrated in Table S2. The range of step time is from 0.5 to 100 and increases at 0.5 per step. The trace of this matrix \({r}_{1}\left(1\right)\) is -0.6240, when the time is 0.5.

Then, node 1 is transferred from partition 1 to partition 2. The elements H11 changes from 1 to 0 while H12 is changing from 0 to 1 when node 1 was assigned to partition 2.

Then, GN modularity is recalculated at the new partition by Eq. 14. The trace of this matrix \({r}_{1}\left(1\right)\) is -0.6536. Thus, variations of modularity as node 1 is assigned from partition 1 to partition 2 can be solved by Eq. 15. \({\Delta }{\text{r}}_{1}\left(1\to 2\right)={r}_{1}\left(2\right)-{r}_{1}\left(1\right)\)=-0.0296. Similarly, the variation as node 1 from partition 1 to partition 3 to 11 are − 0.027, -0.140, -0.077, -0.010, -0.139, -0.145, -0.146, -0.215, and − 0.225. The maximum of the GN modularity is\({\Delta }{\text{r}}_{1}\left(1\to 6\right)=\)-0.010<0. The partition of node 1 does not transfer to the other partition for the next iteration. Others nodes’ variations of GN modularity matrix are shown in Table S3, when the step time is 0.5. No one variation is larger than zero with nodes 10, 11, in which the first iteration will not change the partition of any nodes. Nodes (index of 2–9) have some elements greater than 0. For instance, the maximum of the variation of node 2 is \({\Delta }{\text{r}}_{2}\left(2\to 3\right)\)>0. Node 2 is transferred from partition 2 to partition 3. Finally, a new partition solution is obtained, which has five partitions [{1}, {2,4,6}, {3,5,7,8,9}, {10}, {11}], shown in Fig. S5 (a).

The second step is processed when the time is 1.0 (0.5 + 0.5). The final partition solution is obtained at time 29.5, after iterations of 59 times, shown in Fig. S5 (b). Two partitions are obtained: [{1,2,10}, {2,3,4,5,6,7,8,9,11}]. Boundary pipes are {2–3}, {2–5} between two partitions, where we can set valves to control the connectivity.

After the network partition by RWCD, 2000 partition plans would be obtained. Four different conditions including six, five, three, two partition states are retained, which are shown in Fig. S6.

NSGA-II is employed to optimize MRE, HRE, FEF, NR, and the number of connectivity. The highest and lowest pressures are 140 m and 110 m, respectively. The maximum and minimum levels of the tank are 45.72 m and 30.48 m, respectively. The results are shown in Table 1 that the reliability of WDS is improved after the partition process. The maximum opened boundary pipes are 8, 7, 4, and 2, respectively at six, five, three, two partition conditions of Fig. S6 (a)-(d). The best result can be selected in 2 or 3 partition conditions. HRE at 3 numbers of partitions is greater than 2, and MRE is nearly equal to 2 numbers of partitions, and FEF also greater than 2 numbers of partitions, which indicated that the leakage and water hammer broken will occur in a higher probability in 2 partitions condition. Moreover, only one boundary pipe will cause an unsafe status and cut off contact between two partitions if the pipe breaks. Therefore, the best number of open valves was 2 (L2,3 and L3,6, where 2, 3, and 6 are the index of ends nodes), where MRE, HRE, FEF, NR have the best-optimized values at the 3 partitions condition. Two PRVs should be installed in the second and third boundary pipes (L2,5 and L6,7 where 2, 3 and 6, 7 are the index of ends nodes) and closed.

Table 1 The results of the best solution at different partition states

5 Case studies

Seven cases of WDS were employed to verify the practicability of this method. Cases of frequently-used WDSs are employed, including Net3 (Hwang and Lansey 2017), EXA7 (Bentley Systems 2013), three Kentucky (KY) systems (Jolly et al. 2014) from the Kentucky statewide database, and BWSN developed by (Ostfeld et al. 2008; Amin et al. 2013). In addition, a real case, City F in the south of China is applied to reveal real effectiveness. The parameters of these WDSs are listed in Table S4. The topologies of WDS are shown in Fig. S7S13. The time interval of hydraulic calculation is set to one hour in the EPS process and the maximum time is set to 24th hour. The OWA toolbox is employed for all hydraulic calculations. The range of partition step time is from 0.5 to 100 and increasing 0.5 per step. The computer devices are: CPU is i7-9700K; RAM is DDR4-96G; ROM is intel-SSD-1T; The operating system is Windows 10, and Matlab-2018b is used.

For the case of Net3, each step of GN modularity is first calculated, then network partition is processed. The process of iterations of random walk is illustrated in Fig. S14. The highest and lowest pressures are 200 m and 100 m. The maximum and minimum levels of tanks (name IDs are 1, 2, and 3) are (0 m, 9.78 m), (1.98 m, 12.28 m), and (1.22 m, 10.82 m), respectively. The partition numbers will stop variation when the step is 356. After 2000 steps, Net3 is divided into 14, 10, 9, 8, 7, 6, 5, 4, and 3 partitions at the step of 1, 2, 3, 5, 7, 8, 15, 24, 49, and 356. Then, NSGA-II is employed to optimize pipes and select the best partitions. The populations and maximum iteration time are 20 and 50. The solutions of optimization are manually filtered. Six high potential plans are selected, illustrated in Table S3. The partition is shown in Fig. S15. The four partition plan in Table S5 has the largest MRE, HRE, and NR while FEF is small enough (MRE, HRE, FEF, NR are 1.0000, 0.0908, 1.5183, 0.0051), which indicated that the leakage and water hammer will occur in a lower probability. Therefore, the best solution is four partitions and the two of six boundary pipes (name IDs are 120 and 240) should be opened and PRVs should be set to others, shown in Fig. 1.

Fig. 1
figure 1

The best network partition plans of Net3

The other topological graph can be solved in the same way. For the cases of EXA7, KY3, KY5, KY11, city F, and BSWN Numbers of 15, 5, 5, 12, and 9 partitions plans are obtained by random walk, shown in Fig. S16-S21. The highest and lowest pressures are 80 m and 10 m with EXA7, respectively. The maximum and minimum levels of the tank (name ID 882) are 12.19 m and 9.78 m, respectively. The solutions of NSGA-II optimization are manually filtered, illustrated in Table S6. The best solution of EXA7 has eight partitions and nine boundary pipes should be opened, shown in Fig. S22. The name IDs of these boundary pipes are 421, 504, 614, 617, 708, 733, 768, 769, 821. The other boundary pipes should be closed. The highest and lowest pressure of KY3, KY5, and KY11 are the same as 100 m and 40 m. Three tanks including T-1, T-2, T-3, maximum and minimum levels are (42.88 m, 48.98 m), (44.88 m, 52.50 m), (28.69 m, 39.36 m) in KY3. After NSGA-II optimization, the solutions of NSGA-II optimization are manually filtered and high potential plans are selected, illustrated in Table S7. The best result of KY3 has three partitions with 4 opened boundary pipes, shown in Fig. S23. The name IDs of pipes are P-151, P-341, P-342, and P-346. For the case of KY5, tanks including T-1, T-2, T-3, maximum and minimum levels are (17.56 m, 25.18 m), (7.18 m, 17.85 m), and (8.19 m, 18.85 m), respectively. The results of NSGA-II optimization are illustrated in Table S8. The best result of KY5 has six partitions with 7 opened boundary pipes, shown in Fig. S24. The name IDs of pipes are P-13, P-151, P-226, P-400, P-447, P-458, P-63. For case of KY11, 25 tanks maximum and minimum levels are (0.00, 4.67), (0.00, 3.96), (30.07, 39.21), (12.22, 20.15), (1.87, 11.92), (3.80, 10.50), (13.83, 20.54), (28.02,35.64), (18.22, 21.27), (22.38, 30.00), (0.00, 9.34), (33.89, 38.46), (30.74, 39.89), (25.99, 33.61), (24.47, 30.56), (0.13, 8.66), (41.03, 51.69), (0.00, 7.54), (18.65, 26.57), (34.33, 43.17), (21.19, 25.76), (0.00, 8.00), (25.01, 31.10), (32.94, 39.65), (15.70, 21.80), (12.18, 19.19), (14.47, 27.28), (6.35, 10.92) meters. The optimized results are illustrated in Table S9. The final result of KY11 has nine partitions with 7 opened boundary pipes that name IDs of pipes are P-101, P-111, P-127, P-128, P-13, P-152, P-205, shown in Fig. S25. For the case of city F, no tank exists in the topological graph. Thus the restriction of tank level can be ignored. The optimized results are illustrated in Table S10. The final result of city F has ten partitions with 22 opened boundary pipes that name ID are 89,825, 107,851, 145,471, 154,060, 156,025, 156,761, 156,801, 157,093, 157,778, 157,956, 158,096, 158,097, 158,618, 158,730, 158,997, 159,144, 159,154, 159,163, 159,187, 159,192, 159,194, 159,236, shown in Fig. S26. The results of BSWN for optimization are illustrated in Table S11. Two tanks’ maximum and minimum levels are (0 m, 9.78 m) and (0 m, 12.77 m). The final result of BSWN has three partitions with 2 opened boundary pipes that name IDs are LINK-41 and LINK-85, shown in Fig. S27.

MRE of cases of Net3, EXA7, KY3, KY5, KY11, city F, and BSWN all increase by 0.051, 0.021, 0.047, 0.045, 0.131, 0.004, and 0.030, respectively. HRE all increase by 0.090, 0.286, 0.3696, 0.373, 0.131, 0.235, and 0.003. NR are all increased by 0.004, 0.001, 0.014, 0.019, 0.074, 0.001, and 0.008. Moreover, FEF reduce by 0.419, 2.246, 2.155, 0.897, 2.121, 39.428, and 0.896. The results demonstrate that the FEF values at each partition for each specific case is decreased and MRE, HRE, and NR are improved, thus this method has a great effect on network partition. Meanwhile, each partition can be effective controlled via PRVs.

Above all, comparing with the conditions of no partition, these seven cases all show the improvement of network reliability. Energy and economic losses are decreased due to the reduced leakage and water hammer phenomenon.

6 Summary and Conclusion

An automatic network partition is presented. RWCD model is employed to divide WDS into various partitions according to the node pressures at first. Five functions including MRE, HRE, FEF, NR, and the number of boundary pipes are used as objective functions to access the reliability of a network. Then, the NSGA-II algorithm is applied to optimize WDSs conditions, with respect to boundary pipes and partitions. A high potential strategy of partitioning can be obtained via setting PRVs to the boundary pipes. An example of the Net1 topological graph is employed to briefly explain this process. This method is also verified by seven cases of WDS. The OWA toolbox was used as a hydraulic calculation tool for these EPS processing.

Base on the results obtained the following observations can be made:

  1. (1)

    Network reliability of all of WDS can be improved after network partition comparing with no partition condition, which reduces leakages and water hammers.

  2. (2)

    The best partition plan can be selected by NSGA-II without manual filters.

  3. (3)

    Average pressure is an effective variable for partitioning.

  4. (4)

    A PRV setting to close pipe can get a clear boundary of partitions for convenient management.

Therefore, this method is a practicability method for water resources management.

However, some weaknesses might be found in this method:

  1. (1)

    The more iterations and the fewer steps, the more partition plans will be generated, which may spend much time searching eligible plans in the RWCD model. Meanwhile, the more partition plans, the more optimization processes should be measured. Thus, a time-saving modification should be developed.

  2. (2)

    The maximum of the MRE accessing function is 1.0 (100 %). Some cases have optimized it to up to maximum value and cannot improve further. Thus, a better evaluation function should be invented.

  3. (3)

    In addition, the hydraulic condition of WDS is only considered as independent variables in the random walk partitioning model. Water quality is also an important indicator for WDS that might also be taken into account.

Thus, researching and improving this method should continue in the future.