Introduction

With global industrialization and urbanization, drinking water pollution incidents occur frequently. Through real-time monitoring of the urban water supply network, relevant departments can take measures when water pollution occurs to avoid economic losses and social panic. The network system of drinking-water sensors can monitor the water quality in real-time. Once the pollution is found, the pollution source should be located in real time to reduce the pollution range, which is essential to ensure the safety of drinking water (Hu et al. 2018a, b, 2020; Gu et al. 2017, 2019a, b, c; Gu et al. 2017;Yan et al. 2020). Semih Kuter et al. use satellite data and snow classification for water resource management (Kuter et al. 2018, 2019).

Recently, the water supply network system detects the concentration of pollutants through the arranged water quality sensors and gives early warning. According to the monitored time-varying pollution concentration, the real-time location of the pollution sources reduces the scope of pollution sources in real-time, to locate the real pollution sources. The increase of pollution time and concentration data is more restrictive to the optimal solution, which makes it easier to find the pollution sources. Once the pollution is detected, the real-time location of the pollution sources can optimize the potential pollution sources and locate the real source. Besides, reducing the diffusion time of pollution in the water supply network, it ensures the safety of citizens’ water use and reduces economic losses and negative impacts.

The algorithms of pollution sources location can be divided into particle backtracking, machine learning, and simulation-optimization. In early research, PBA was widely used to deduce pollution sources by tracking the state of the last time point (Shang et al. 2002; Laird et al., 2005, 2006). Then, machine learning and simulation-optimization appeared one after another. The machine learning algorithms calculate the pollution probability of each node in the water supply network to determine the highest probability is the pollution source. The simulation-optimization methods are transform the pollution sources lcocation problem into the optimization problem and use the intelligent optimization algorithms to solve it. The intelligent optimization algorithms have widely applications (Yang et al. 2018, Li et al. 2019, Wang et al. 2020a, b, 2021). Ayse Özmen, Fatma Yerlikaya-Zkurt, and their groups proposed the multivariate adaptive regression splines (MARS); it is a form of non-parametric regression analysis for building high-dimensional and nonlinear multivariate functions and applied in many fields of science, engineering, technology, finance, and control design in recent years. It is a modern methodology of statistical learning, data mining, and mathematical estimation theory which is important in both regression and classification (Özmen, 2016; Roche and Yalcinkaya, 2019, Yerlikaya-Zkurt and Taylan, 2020).

Huang et al. (Huang and Edward, 2009) considered the uncertainty of water demand, sensor measurement and modeling, so the algorithms of data mining combined with the maximum likelihood process are used to predict the location and time of pollution sources. Perelman et al. (Perelman and Avi, 2013first used a clustering algorithm to simplify the water supply network system and then combined with a Bayesian network to predict the most likely pollution sources through probability.

Wang and Xin (Wang and Xin 2013) used the Markov Chain Monte Carlo Algorithm to predict the characteristics of pollution sources due to the uncertainty of the water supply network system. On this basis, Wang and Harrison (Wang and Harrison 2014)  used a support vector regression algorithm to speed up the likelihood estimation to reduce the calculation time. Wagner et al. (Wagner et al. 2015) proposed a method based on ad-joint probability to identify the location, time, and quality of pollution sources. Such methods can not accurately locate the pollution source, but only give the possibility that the node is a source of pollution. However, with the increased pipe network scale, the calculation complexity of node probability rises dramatically.

Seth and co-workers (Seth et al., 2016) compared the algorithms for solving pollution source locations, which are the Bayesian probability method, pollution source state-based method, and simulation optimization method, where the last one can solve most pollution intrusion events. The simulation-optimization process continually searches for the optimal solution by comparing the similarity between the actual monitoring data of sensors and the simulation data set of EPANET. Avi and Elad (Avi and Elad 2005) used the genetic algorithm to maximize the random pollution matrix to search for pollution sources. Guan et al. (Guan et al. 2006) designed the optimization model as a corrector to identify the similarity between the simulation response data and the measured data of the monitoring point and then determine the pollution source. Zechman et al. (Zechman and Ranjithan, 2009) used the global evolution strategy algorithm based on tree coding format to search for pollution sources. Lv et al. (Lv et al. 2010) used a simulation-optimization algorithm to compare the operation results of different input parameters, and the accuracy of successfully locating pollution sources reached 93.3%. Hu et al. (Hu et al. 2015) quantified the non-uniqueness of the localization solution of pollution sources. They proposed a parallel niche genetic algorithm based on MapReduce, which calculated the location, quality, and time vectors of pollution sources. Yan and other researchers (Yan et al. 2017a, b, 2018, 2019a; Gong et al. 2019) transformed the problem of pollution source location into different optimization problems and solved them with different intelligent optimization algorithms.

Based on the pollution source location, the real-time location is studied. De et al. (De et al. 2010) developed the alternative practical method to locate pollution sources in real-time by judging the consistency of the candidate source node's pollution possibility state and time interval with the sensor data. Liu et al. (Liu et al. 2011) proposed adaptive dynamic optimization algorithms, including strategies of adaptive mutation and multiple groups. Once the sensor detects the pollution, the algorithm can locate the source in real-time and narrow the scope. Costa et al. (Costa et al. 2013) first found the paths between pollution points and sources with PBA, then evaluated possible sources, and finally searched the source in real-time according to the continuous reading of the sensor. Wang and Zhou (Wang and Zhou 2017) proposed the time series Bayesian method. First, obtain the probability distribution of sensor observations of polluted nodes in offline mode. Then calculate the posterior probability in real-time and decompose it into a hierarchical tree structure. Finally, the nodes with the highest likelihood are polluted ones in real time. Yan et al. (Yan et al. 2019b) simulated uncertain water demand through the Gaussian models and proposed a real-time positioning algorithm, which can find real pollution events with fewer sensor data in a short time.

The main algorithms of pollution source location are machine learning and simulation optimization. However, the latter is only suitable for small- or medium-sized pipe network nodes because of the slow solution speed.

Materials and methods

Simulation software

The simulation software of this work is EPANET (Rossman 2000), which can simulate and analyze the water supply system. EPANET traces chemical concentration, pipeline water flow, and node pressures by simulating the characteristics of reservoirs, valves, pumps, pipelines, and nodes, which is beneficial to water supply enterprises and scientific research.

In the EPANET water supply network system, any node can be used as the source water quality. In the time mode of source water quality, pollutants can be dynamically injected into a node and diffuse. Different ways of injecting pollutants at the source cause the actual pumping of pollutants to be changed. Source injection methods include mass injection, flow step injection, set point injection, and concentration injection.

The mass injection is to inject equal quality pollutants according to the set mass curve. Flow step injection is to insert fixed concentration pollutants into the water inlet end of the nodes, while set point injection is to insert the pollutants into the outlet. The above three methods are applied to simulate the invasion of tracer, disinfectant, and pollutant without considering the water demand of the nodes. However, concentration injection is suitable for simulating the source of water supply, because it changes the pollutant concentration at the source node, which is related to water demand.

Problem modeling

In the drinking water supply network, sensors at key nodes monitor water quality in real time. Once pollution is detected, it will warn and continuously record real-time water quality. With the increase of information collected, it is essential to locate the characteristics of pollution sources in real-time, such as the location of pollution sources, injection time, duration, and quality vector. Sensors are only arranged to key nodes owing to the large scale of the water supply network. It makes the location of pollution sources “symmetrical,” leading to the non-uniqueness of solutions. This problem belongs to multi-mode optimization (Hu et al. 2015; Yan et al. 2019c).

The real-time location of the pollution source is proposed based on the source location. From the perspective of simulation optimization, both are reverse-tracing problems, which is to solve the pollution source by comparing the difference between the cumulative concentration data of the potential pollution and the real. However, the simulation optimization process of pollution source location is static, while the other is dynamic. Figure 1 (a) shows that when the sensor detects pollution, only by accumulating enough concentration data can the optimization algorithm locate the pollution source. In previous studies, the cumulative concentrations calculated by the data model ranged from 24 to 48 h. When the pollution in the pipe network is detected, the source location cannot show the scope immediately, and the pollution will spread for 10–20 h. Therefore, this work proposes the real-time location of pollution sources. Figure 1 (b) shows that when pollution is detected, the real-time location of pollution sources will be optimized in a short time. The ever-increasing real observed concentration data makes the optimization algorithm more accurate. Unlike the cumulative concentration data of the data model of pollution source location, the real-time site is between the time when the pollution is first detected at the polluted time t0 and the current time tc. Moreover, the increased tc increases concentration data and the accuracy of the optimization algorithm.

Fig. 1
figure 1

Concentration comparison

The non-uniqueness of the solution is more challenging in the real-time location of pollution sources rather than the location. The increased monitoring data will optimize the solution to reduce the quantity. As for optimization, it is necessary to compare the real sensor concentration accumulation data with the simulated data at a specific time and take this as the individual fitness value. With the spread of pollution sources, the pollution concentrations monitored by sensors vary at different times, resulting in the inability to compare individual fitness values. Therefore, this work expresses the optimization problem as Eq. (1),

$$ {\displaystyle \begin{array}{c}\underset{\left\{L,\mathbf{M},{T}_0,D\right\}}{\mathit{\operatorname{Minimize}}\ f=}\sqrt{\frac{\sum \limits_{t={t}_0}^{t_c}\sum \limits_{i=1}^{Ns}{\left({c}_{it}^{obs}-{c}_{it}^{\ast}\right)}^2}{\sum \limits_{t={t}_0}^{t_c}\sum \limits_{i=1}^{Ns}{\left({c}_{it}^{obs}\right)}^2}}\\ {}\mathrm{Subject}\ \mathrm{to}\kern0.75em \mathbf{M}=\left\{{m}_1,{m}_2,\cdots, {m}_k\right\},\kern0.5em {m}_i\mathbf{\ge}0\\ {}\mathrm{and}\kern0.5em L=\left\{1,\cdots, \kern0.5em N\right\}\end{array}} $$
(1)

where f is the prediction error; t0 the time when the sensor first detects the pollution in the real scene; tc the current time; Ns the number of sensors; citobs the pollutant concentration of the observation sensor i at time t; cit* the pollutant concentration of sensor i when EPANET simulates potential pollution event at time t, which can be obtained by function {L,M,T0,D}; L the total number of nodes; M the injection vector of the pollution source; T0 the initial injection time; and D the injection duration. The optimization goal is to obtain the optimal solution {L,M,T0,D} by finding a minimum value of f. This formula calculates the similarity between the simulated and the real concentration from t0 to tc. The data on increasing pollution concentration increases with the increase tc, which leads to the optimization of the candidate solutions. Also, the degree of non-uniqueness of the problem reduces, that is, the number of potential pollution events decreases.

Therefore, the number of potential solutions for the real-time location of pollution sources will decrease with time until finding the source event. In this work, through the limited sensor data set to search the whole water supply network, we can obtain multiple possible pollution sources. When the real time is considered, the data increases. The constraint on the solution is more significant, which causes the non-uniqueness degree of the solution to decrease. Thus, this work transforms the problem of the real-time location of pollution sources into a dynamic multi-mode optimization problem.

Real-time location algorithm based on domain knowledge

In this work, a simulation-optimization algorithm is used to solve the real-time location of pollution sources. In the water supply network system, the sensor detects the pollution for the first time at time tc, which indicates that pollution intrusion exists. The pollution concentration information recorded by the sensor can be used as the real pollution source data after each time step Δt. In real life, when pollution is detected for the first time, relevant departments should launch a dynamic multi-mode optimization algorithm to search the pollution source information accurately. At a specific time tc, a series of calculated pollution events are input to EPANET to generate the simulated concentration data.

Comparing the real concentration data with the simulation, we can screen out hidden pollution events to locate the pollution source information as soon as possible. Figure 2 shows a detailed flowchart for solving the real-time location of pollution sources. When the difference between the cumulative simulated concentration and the actual cumulative monitoring is 0 or less than a specific threshold value at the time tc, the pollution event is the real pollution source (including pollution source location, injection time and duration, and quality vector).

Fig. 2
figure 2

Simulation-optimization framework

Algorithm architecture

It is critical to detect environmental changes and the corresponding methods in dynamic optimization problems, which can be solved by dynamic multi-group optimization algorithms (Cruz et al. 2011; Nguyen et al. 2012; Jin and Branke 2005; Yang and Li 2010). The environmental change is that the collection of pollution information at each time step is increasing, which means that the real-time location is a static optimization problem at a specific time.

At time t, k sub-populations iterative evolve and optimize simultaneous. During iterative evolution, the population searches globally through selection, crossover, and mutation. If an individual of the population satisfies certain conditions, it enters the local optimizer for a local search to improve the accuracy of the algorithm. The premise is to assume that the optimal individual of each sub-population is a feasible solution to the problem. Before entering the next moment, the population adopts strategies of merging populations, retaining excellent individuals and introducing random individuals to respond to changes. When k populations are optimized, the node positions of the optimal individuals of some populations may be the same, which indicates that populations with the same search direction need to merge. Environmental change will lead to the premature convergence of k sub-populations and increase search costs. Sub-population needs to keep the optimal individuals of the same node while replacing the deleted individuals with new ones generated randomly. Figure 3 shows the specific optimization process.

Fig. 3
figure 3

Algorithm framework

Selection strategy based on domain knowledge

Previous real-time location studies of pollution sources had ignored the moment when the water quality sensor first detected pollution, which was essential for accurately locating pollution sources. For example, in the BWSN1_Ostfeld2008 Avi et al. 2008 pipe network, the 44th node was injected with a pollution event. Figure 4 shows the pollution concentrations detected by sensors 10 and 100 in a day. The left side of the dotted line indicates that the pollution concentration is zero, while the time when the pollution is first detected is 6:10. According to the experience of experts, if the time t0* for the first detection of potential pollution is farther from t0, the less likely it is to be the real pollution.

Fig. 4
figure 4

Observed pollution concentrations

Priority is given to the optimization of individuals with A close to B, which can find the real pollution events. The processing formula is as Eq.(2).

$$ \left|{t}_0^{\ast}\hbox{-} {t}_0\right|<\varepsilon $$
(2)

Each individual in the population is a pollution event, and the first pollution monitoring time t0* can be obtained through EPANET. When the individual’s satisfies Eq. (2), it can be regarded as a potential pollution event and will not be eliminated. Considering the individual fitness value, if the potential individuals selected by each sub-population meet the Eq. (2) dominate, the roulette wheel selection is improved by individual fitness value. Otherwise, it is improved by the individual’s |t0*-t0| value. And the threshold ε=30 min.

Domain search strategy based on domain knowledge

At present, most studies have adopted single-point mutation for the injection start and duration time after the hybrid coding of pollution events. The two have no relevance, which makes the algorithm fall into local optimum. Experiments found that when the time series determined by the injection start and duration time highly coincides with the real pollution source, the mutation falls into local optimum. For example, the actual injection start and duration time are (2, 4), while the time sequence of (3,3) coincides with the real by 3, 4 and 5, which means the feasible solution containing the gene fragment may be a right solution. In the local optimum, the probability of (3,3) single point mutation to (2,4) is low, indicating that (3,3) is better than (5,4). Although the duration of (5,4) is consistent and (3,3) is inconsistent, individuals with (3,3) gene fragments have lower fitness values than (5,4). The above analysis shows that the higher the time series coincidence of pollution events, the smaller the prediction error. This work, from the time series of injection start and duration, proposed a domain search strategy. The start and duration time was taken as a whole to search for the time series with the highest degree of coincidence, keeping individual diversity and jumping out of the local optimum. The Harley coding principle adopted here made the time series obtained by the search different from itself by only one time period.

In Fig. 5, it is assumed that the time interval is 0–11 for a total of 12 h, where the gray time quantum indicates that the pollution source has been injected. The second line represents the time series corresponding to (2,4). The adjacent search directions may be different, but only search in four-time series that differ by one time period, which is represented by (3,3), (1,5), (2,3), and (2,5) represented by 3, 4, 5, and 6 in the figure. Randomly selecting one of the searches results in a corresponding new start and duration time.

Fig. 5
figure 5

Domain search process

Local search strategy based on domain knowledge

Individual gene coding methods include integer coding and variable-length real-number coding. During selecting, crossing, and mutation, the dynamic algorithm is easier to find the integer coding genes composed of node position, injection start, and duration time, while the variable-length real-number coding quality vector is difficult to converge to the exact value. Experiments showed the fitness value of the dynamic optimization algorithm based on Darwin’s idea dropped rapidly at the beginning and then slowly. It became invariant after iteration, which could not reach a minimal value. The algorithm accuracy was improved by searching for the quality vector separately and increasing the optimization cost. Referring to the analysis of improved selection strategies, when the potential individual |t0*-t0|=0, the quality of the individual is searched separately. In this work, the behavior of optimizing the quality vector of potential individuals satisfying |t0*-t0|=0 is called the local search strategy of the whole multi-group dynamic algorithm.

Particle swarm optimization (PSO) algorithm is widely used in engineering. It has a simple algorithm, fast convergence, and few adjustment parameters, which is suitable for optimizing continuous problems. Therefore, the standard PSO algorithm is used in the local search of the quality vector for the real-time location of pollution sources. In this paper, let ω=0.8, c1=c2=2.

Algorithm steps

This work proposed multiple strategies based on domain knowledge to analyze the characteristics of the real-time location of pollution sources. When the original Dynamic Multi-group Optimization algorithm (DMOA) was combined with the three improved strategies, the algorithm could quickly converge while improving accuracy. The steps of multi-strategy dynamic multi-mode optimization algorithm based on domain knowledge are as follows:

  • Step 1 Initialization: k sub-populations are initialized randomly, namely, p1, p2,…pk, where the initial value of k depends on the size of the network.

  • Step 2 Population iteration: Let the current time tc=t0t, and let the current iteration number g=0.

  • Step 2.1 Selection operator: Let g=g+1. Each sub-population adopts an improved selection strategy. That is when the potential individual of sub-population Pi dominates, the roulette wheel selection is improved by the prediction error. Otherwise, improved by the size of the individual |t0*-t0|.

  • Step 2.2 Crossover operator: Discrete coding (position and time variables) adopts a two-point crossover, while continuous coding (mass vector) adopts real number recombination.

  • Step 2.3 Mutation operator: The position and quality vectors adopt a single point mutation, while the time variable takes the improved proximity search strategy in 3.3.

  • Step 2.4 Condition of stopping iteration: when ggMax, update the fitness values and elite individuals of each sub-population and continue Step 2.1. Otherwise, continue Step 3.

  • Step 3 Local search: Considering the time cost of the algorithm, PSO search is performed separately on the quality vectors of the individuals satisfying |t0*-t0|=0 after each sub-population iteration.

  • Step 4 Response to changes: Only the best individuals with the same position in each population are retained after merging sub-populations, while new individuals replace the original ones.

  • Step 5 Condition of stopping the algorithm: When tc-t0<tMax, the algorithm ends. Otherwise, Step 2 continues.

Results and discussion

Setting of pipe network and algorithm parameters

This work compared the networks Net3_Rossman2000, BWSN1_Ostfeld2008, and KY5_Jolly2013 to verify the effectiveness of Multi-Strategy Dynamic Multi-Mode Optimization Algorithm (MSDMOA) based on domain knowledge. Table 1 shows the detailed settings of the three experimental pipe networks (Ostfeld et al. 2008). The number of nodes, pipes, reservoirs, and pools in the innovative pipeline networks determines their scales and complexity, which in turn determines the location and number of sensors.

Table 1 Pipe network parameters

Assuming that all experiments are single-source injections, the algorithm proposed in this work is compared with DMOA to analyze the results. Table 2 is the parameters of the algorithm proposed in this work.

Table 2 Algorithm parameters

The parameters of experimental equipment are as follows CPU is Intel® Core™ i5-3230 2.6 GHz, RAM is 8G, Hard Disk is 1 TB, and the computer operating system is win7 32bit.

Algorithm performance indicators

There are few existing evaluation indicators about the real-time location of pollution sources, but this work gives useful performance indicators based on the characteristics of the problem. The purpose of real-time positioning research is to locate the pollution source through an optimization algorithm when the water quality sensor detects the pollution, which minimizes the harm. The following are performance indicators.

Positioning accuracy: Due to the non-uniqueness of the solution to the real-time location of the pollution sources, the best solution of the dynamic algorithm may be one or a group of pollution events. The pollution position of the best solution is defined as {l1,…, lN}, N>0, where N is the optimal solution set size. If the actual pollution source position is ltrue∈{l1,…, lN}, the optimization locates the pollution source. In the same pollution scene, R times of the same experiments are used to locate Rsucceed times, and the positioning accuracy rate is p=Rsucceed/R.

Average fitness value for positioning success: The increase in observation time reduces the non-uniqueness of the problem, resulting in different observation time solutions. If the optimal solution set of the dynamic algorithm is N at the current time tc, then the fitness value is fcNi=lturefi/N. Since the result of inaccurate positioning is meaningless, the average value fc of successful positioning in multiple experiments is an important performance indicator.

Algorithm performance discussion

Net3_Rossman2000 pipe network

In this experiment, the network of Net3_Rossman2000 was used. Table 1 shows the parameters of the pipe network, while Fig. 6 shows the topological structure. Table 3 shows the settings of two pollution scenarios, in which the pollution concentration continuously injected is in 1 hour.

Fig. 6
figure 6

Net3_Rossman2000 pipe network

Table 3 Pollution scenarios of Net3_Rossman2000 pipe network

Fig.6 shows the optimization results of the two pollution scenarios after the algorithms MSDMOA and DMOA are conducted 30 times, and 2 h after the sensor first detects the pollution. Table 4 shows the results of the algorithm performance and comparison indicators proposed above.

Table 4 Results of Net3_Rossman2000 pipe network

Table 4 shows that the location accuracy of DMOA based on multi-strategy is better than that of the general, meaning that the ability of MSDMOA to search and jump out of local optimization is better than that of DMOA. Besides, the optimal fitness value and the average fitness value of successful localization of MSDMOA are less than that of DMOA, but the convergence degree is reversed. The above indicates that the start time, duration, and concentration curve of MSDMOA to obtain the optimal solution. However, although the accuracy of MSDMOA is better than DMOA in the pollution incidents at the 16th node, there is still room for improvement. The accuracy of the algorithm improves with the increased positioning time or sensors; however, because of the damage and cost of pollution, it can only do more efficient positioning in limited time and sensors.

Pollution events 16 and 86 were first detected by water quality sensors at 3:30 a.m. and 3:25 a.m., respectively. Figure 7 shows the average real-time location fitness values of the two dynamic algorithms within 2 h after the pollution is first detected, where the first performance of MSDMOA is similar to DMOA. However, in the later stage, MSDMOA’s fitness value decreases rapidly, and the searched pollution source is more accurate, which shows a better ability to jump out of local optimization and search. Besides, the pollution concentration detected by the sensor increases sharply at a specific time, which expands the environmental fluctuation. The average fitness curve risen slightly in the later period, indicating that the ability of MSDMOA to resist environmental changes needs to be improved. Figure 8 shows the changes in the number of optimal solutions to pollution sources overtime in 2 h, where MSDMOA performs better than DMOA, proving a strong search ability and fast convergence speed.

Fig. 7
figure 7

Time-varying graph of the average fitness value of the Net3_Rossman2000 pipe network

Fig. 8
figure 8

Time-varying graph of the solutions to the Net3_Rossman2000 pipe network

BWSN1_Ostfeld2008 pipe network

In this experiment, the BWSN1_Ostfeld2008 water supply pipe network was used. Table 1 shows the parameters of the pipe network, while Fig. 9 shows the topological structure. Table 5 shows the settings of the two pollution scenarios, where the pollution concentration continuously injected is in a unit of 1 h.

Fig. 9
figure 9

BWSN1_Ostfeld2008 pipe network

Table 5 BWSN1_Ostfeld2008 pollution scenes of the pipe network

Figure 9 shows the optimization results of two pollution scenarios after MSDMOA and DMOA are conducted 30 times, 2 h after the sensor first detects the pollution. Table 6 shows the results of the algorithm performance and standard comparison indexes.

Table 6 Results of BWSN1_Ostfeld2008 pipe network

Table 6 shows that MSDMOA is superior to DMOA in positioning accuracy and optimal fitness values, indicating that MSDMOA has more potent abilities to jump out of local optimum and search. The average fitness value of MSDMOA at successful positioning is better than that of DMOA. However, it is opposite in pollution event 23, because jumping out of the local position at the later stage results in insufficient convergence of the individual. In general, MSDMOA has a robust search performance, but there are individuals with insufficient convergence.

Pollution events 23 and 92 were first monitored by water quality sensors at 5:40 and 11:10 am, respectively. Figure 10 shows the average fitness value of the two dynamic algorithms for real-time localization within 2 h of the first monitoring of pollution. The fitness value and the upward fluctuation of the MSDMOA curve are small in the later period, showing a stronger ability to adapt to environmental changes. Figure 11 shows that the non-uniqueness degree of the solution to the real-time location of pollution sources decreases with time (the increase of data). The performance of MSDMOA is better than that of DMOA, indicating the strong search ability and fast convergence speed.

Fig. 10
figure 10

Time-varying graph of the average fitness value of BWSN1_Ostfeld2008

Fig. 11
figure 11

Time-varying graph of solutions number to BWSN1_Ostfeld2008

ky5_Jolly2013 pipe network

This experiment uses the KY5_Jolly2013 water supply pipe network. Table 1 shows the parameters of the pipe network, while Fig. 12 shows the topological structure. Table 7 shows the settings of the two pollution scenarios tested in the water supply network, where the continuously injected pollution concentration is in 1 hour.

Fig. 12
figure 12

KY5_Jolly2013 pipe network

Table 7 Pollution scenarios of KY5_Jolly2013 pipeline network

Figure 12 shows the optimization results of the two pollution scenarios after MSDMOA and DMOA are tested 30 times, while the pollution is first detected within 2 h. Table 8 shows the results according to the algorithm performance indicators and common comparison indicators.

Table 8 Results of KY5_Jolly2013 pipe network

Table 8 shows that the positioning accuracy and optimal fitness value of MSDMOA are better than DMOA, indicating more potent abilities to jump out of local optimum and search. The average fitness value of MSDMOA at successful positioning is better than DMOA, except for pollution incident 155. The non-uniqueness degree of the solution to the pollution event 155 in the pipeline network is higher than that of the first two pipelines, which makes the average fitness value challenging to converge. Therefore, MSDMOA has excellent stability to be used on the big pipe network.

The first time that a water quality sensor detected pollution events 31 and 155 was at 2:15 a.m. and 3:05 a.m., respectively. Figure 13 shows the average fitness values of the two dynamic algorithms for real-time localization within 2 h of the first monitoring of pollution. MSDMOA has a fast convergence speed and a strong ability to adapt to environmental changes. Figure 14 shows the changes in the number of optimal solutions to pollution sources over time within 2 h, where MSDMOA performs better than DMOA. However, the number of the solutions with the two algorithms on the pollution event 155 fail to converge, indicating that the non-uniqueness of the solution increases along with the pipe network scale.

Fig. 13
figure 13

ky5_Jolly2013 average fitness values time-varying graph

Fig. 14
figure 14

KY5_Jolly2013 solution numbers time-varying graph

Conclusions

In this work, the simulation optimization method was used to solve the real-time location of drinking water pollution sources. Water quality sensors were only set at critical nodes of the water supply network, resulting in multiple solutions. The gradual increase of the monitoring data led to the optimization of the solution and the reduction of non-uniqueness. Therefore, the dynamic multi-mode optimization algorithm based on domain knowledge was designed to solve multiple solutions. The multi-group merging mechanism prevented the sub-populations from gathering at the same pollution location as the environment changed. Meanwhile, to improve accuracy and avoid falling into local optimality, this work used domain knowledge to enhance the evolution step by a multi-strategy hybrid search. The simulation results in three different scale pipe networks showed that the difficulties of locating real pollution events were various under different scales, topology, and pollution scenarios. The positioning accuracy of the algorithm in this work was higher than that of unmodified DMOA, especially for the excellent ability to jump out of the local optimal. Moreover, when it was converted to the same fitness value, the convergence speed of the proposed algorithm was higher than that of the unmodified DMOA. The algorithm proposed in this paper can locate the pollution source quickly and meet the real-time requirement of the location of drinking water pollution sources. The proposed algorithm can promote the method and technology in the field of pollution sources location, which has important scientific and practical significance.