1 Introduction

Combined sewer overflow (CSO) pollution was a challenge facing urban water environment managers across the globe (Atauzzaman et al. 2022; Campisano et al. 2016; Liu et al. 2023; Quaranta et al. 2022; Salam 2024). In China, many cities were haunted by the issue of returning black and odorous waters during the rainy seasons (Dai et al. 2020; Guan et al. 2023; Li et al. 2021; Pu et al. 2022; Wang et al. 2020) and CSO is one of the important causes of this issue. In this context, the demand for the effective control of CSO is becoming increasingly urgent.

For CSO pollution abatement, the fundamental questions are to study the rainfall intensity threshold and calculate the frequency (or return period) and volume of CSO. Three types of the relevant investigations were conducted in the past decade. The first type is to determine the critical rainfall intensity threshold that CSO may occur based on 16 years of CSO and precipitation data in Cumberland, Maryland, USA (Bizer et al. 2022). The second one is to predict the return period of CSO based on the multi-year rainfall records and empirical data of CSO outlets (Mailhot et al. 2015; Yu et al. 2013, 2018). The third one is to evaluate the performance of CSO facilities and assess the impact of climate change on CSO based on rainfall intensity thresholds (Andrés-Doménech et al. 2010; Fortier et al. 2015; Schroeder et al. 2011).

However, the above methods are all simulation driven (e.g. storm water management model (SWMM) or observed data-driven, and do not consider the physical mechanism of CSO occurrence, resulting in uncertainty of the calculation results. Hence, the low-return-period rainfall intensity formula was proposed to calculate the design discharge of the interceptor sewers and initially estimate the rainfall intensity threshold of CSO for the specific interceptor well (Liu et al. 2023). Meanwhile, the probability distribution of excess rainfall intensity (ERI) was investigated to estimate the volume of CSO based on POT (peak over threshold) sampling and the Generalized Pareto Distribution (GPD) (Liu et al. 2024). It only take one threshold (the lower threshold) of CSO into consideration. Physically, CSO volume from the specific interceptor well depends on the relationship between transfer flow rate of the upstream combined sewer and the interceptor flow rate of the downstream interceptor sewer, therefore CSO occurrence is a double-threshold phenomenon in essence. In this context, the concept of excess rainfall intensity of double thresholds (ERId in short) was proposed and its frequency distribution was investigated in this work.

The rest of this study is organized as follows: Sect. 2 details the rationale and flowchart of the proposed method. Section 2.1 describes the rainfall event division scheme. Section 2.2 illustrates the sampling method of ERId. Section 2.3 describes the method for calculating the empirical frequency of ERId. Section 2.4 describes the optimization method of the frequency distribution of ERId. Results of the case study are demonstrated in Sect. 3. Finally, findings of this work are summarized in Sect. 4.

2 Methodology and Data

Based on the recorded long-term rainfall series, the flowchart for the derivation of the frequency distribution model of ERId is presented in Fig. 1.

Fig. 1
figure 1

Flowchart of this study

2.1 Rainfall Event Division Method

CSO events originate from rainfall events, and how rainfall events are divided is crucial to identify CSO events and calculate their frequency. The minimum inter-event time (MIET) is a key parameter for rainfall event division. For CSO scenario, MIET between two rainfall events should be greater than or equal to the time of concentration of the tributary area of the interceptor well studied (Liu et al. 2023). For urban catchments connected to a combined sewer system, the time of concentration normally does not exceed 2–3 h (Zhang et al. 2016). Rainfall statistical analysis software (RSAS) was developed to divide rainfall events in this study (Liu et al. 2023).

2.2 Sampling Method of ERId

As mentioned above, CSO occurrence is a double-threshold phenomenon. Here, the lower threshold is the design discharge of the downstream interceptor sewers minus the average dry weather flow(Liu et al. 2024), while the upper threshold is the design discharge of the upstream combined sewers. If the actual flow rate is larger than the upper one, the street flooding will occur. Otherwise, it will be discharged into the interceptor well (or sewer). The calculation principle of ERId was detailed in Fig. 2.

Fig. 2
figure 2

Principle of excess rainfall intensity of double thresholds (ERId)

Among them, the design discharge of combined sewers can be obtained from the rainfall intensity formula (intensity-duration-frequency curve or IDF curve) while that of the interceptor sewers can be calculated from the low-return-period rainfall intensity formula (Liu et al. 2023), as shown in Fig. 3. The form of the above rainfall intensity formula in China was shown in Eq. (1).

$$i = \frac{{{A_1}(1 + C\lg P)}}{{{{(t + b)}^n}}}$$
(1)

where i is the rainfall intensity, P is the return period, t is the rainfall duration, and A1, C, b, n are the local parameters of the specific city, which can be obtained by the maximum likelihood estimation method, Markov Chain Monte Carlo (MCMC) method, and so on.

Fig. 3
figure 3

Calculation of ERId for the single rainfall event

According to Fig. 3, the total excess rainfall intensity (ERI) and the corresponding total CSO duration for the single rainfall event can be obtained. Normally, the specific CSO duration can be selected for calculation of ERId. Thus, ERId equals to the total ERI dring the specific duration divided by this duration. It can be expressed as ERId(CSO duration). In this study, 30-minute CSO duration was illustrated for calculation of ERId(30).

After ERId calculation for the single rainfall event, a number of ERId samples can be collected annually. Similar to rainfall intensity sampling for storm sewer design, annual maxima sampling (AMS) method and annual multiple sampling method can be used here. In order to focus on studying the frequency distribution characteristics of ERI, all ERI samples of rainfall events were used for subsequent frequency calculations in this work.

2.3 Calculation of the Empirical Frequency of ERId

All the samples of ERId are sorted in the descending order, and their corresponding empirical frequency can be calculated according to the mathematical expectation formula or Weibull formula (Weibull 1939), as shown below:

$$P = \frac{m}{{n + 1}} \times 100\%$$
(2)

where P is the empirical frequency, m is the serial number and n is the amount of samples.

2.4 Selection of Frequency Distribution for ERId

According to the double-threshold characteristics of ERId, its frequency distribution can be modeled by the doubly censored distribution theory. In this study, this theory is based on that the main part of the CSO is a distribution truncated at both ends between the lower threshold value d and the upper threshold value u, i.e., the population distribution of ERId is the conditional distribution of the [d, u] part. Assuming that F(x, θ) is the distribution function of ERId and f(x, θ) is the corresponding probability density function, θ is a parameter that can be estimated using optimization algorithm. The doubly censored distribution function and density function are shown in Eq. (3) and Eq. (4) (Chen 2019; Tian et al. 2014):

$$F(x;\theta |d \leqslant x \leqslant u) = \frac{{F(x,\theta ) - F(d,\theta )}}{{F(u,\theta ) - F(d,\theta )}}$$
(3)
$$f(x;\theta |d \leqslant x \leqslant u) = \frac{{f(x,\theta )}}{{F(u,\theta ) - F(d,\theta )}}$$
(4)

The conventional doubly censored distribution can be calculated from the Eq. (3) and Eq. (4). The density functions of normal and exponential distributions under doubly censored conditions are given in similar computational studies as shown in Eq. (5) and Eq. (6).

Doubly censored normal distribution:

$$f(x|d \leqslant x \leqslant u) = \frac{1}{{[\varphi (\frac{{u - \mu }}{\sigma }) - \varphi (\frac{{d - \mu }}{\sigma })\sqrt {2\pi } \sigma ]\exp [ - \frac{{{{(x - \mu )}^2}}}{{2{\sigma ^2}}}]}}$$
(5)

Doubly censored exponential distribution:

$$f(x|d \leqslant x \leqslant u) = \frac{{\lambda {e^{ - \lambda x}}}}{{{e^{ - \lambda d}} - {e^{ - \lambda u}}}}$$
(6)

where d is the lower threshold parameter; u is the upper threshold parameter; µ represents the average value; σ represents the standard deviation; ϕ is the cumulative distribution of the standard normal distribution.; λ is the parameter and is estimated by MCMC algorithm in this study.

In addition, several probability density distribution functions were also investigated in this study to analyze the frequency distribution characteristics of the ERId, such as lognormal, exponential, Gumbel, and Weibull distributions (Ahmad I 2019; Alhassoun 2011; Haddad et al. 2011; Yilmaz et al. 2014). The coefficient of determination R2 is a commonly used goodness-of-fit statistic in regression analysis to assess fitting performance (Fu et al. 2014). In this work, it was used to evaluate the accuracy of model fitting.

2.5 Study data

In this study, a set of ten-year recorded rainfall series (2008/01/01 to 2017/12/31) from a meteorological station in Sichuan Province, southwestern China, was used for illustration, its temporal resolution is 5 min, as shown in Fig. 4.

Fig. 4
figure 4

The recorded rainfall series for the case study (2008/01/01-2017/12/31)

3 Results and Discussions

3.1 Statistical Results of Rainfall Event Division

For MIET = 3 h, the original rainfall series were divided into 694 single rainfall events. The statistics of rainfall events were shown in Table 1.

Table 1 Statistics of rainfall events

3.2 Sampling Results of ERId

In this study, eight sets of double-threshold scenarios ([0.26,0.58], [0.33,0.76], [0.40,0.91], [0.49,1.11], [0.58,1.32], [0.65,1.46], [0.73,1.65], [0.83,1.88] mm/min) were used for sampling ERId for all the above rainfall events. These upper and lower thresholds were calculated according to the corresponding rainfall intensity formula under the specific return period. Moreover, taking CSO duration of 30 min as an example, the results of ERId(30) for the representative rainfall event (3:10 2008/7/3 to 6:50 2008/7/3) were shown in Table 2. Among them, the maximum of ERId(30) was sampled for the specific rainfall event (only sample one maximum of ERId for a rainfall event), as shown in Fig. 3. The sampling results of ERId(30) were shown in Table 3.

Table 2 Results of ERId for the typical rainfall event
Table 3 Sampling results for ERId(30)

It was found that from Table 3: (1) The average values of ERId(30) are all larger than the median value, and Cs for all scenarios are larger than zero, which indicate that the frequency distribution of ERId is right-skewed. (2) The average values of Cv of ERId(30) under different double thresholds is almost 1, which shows that the ERId samples are less discrete. (3) None of the kurtosis is greater than 3, which indicates that the frequency distribution of ERId(30) is thin-tailed.

3.3 Results for Empirical Frequency of ERId(30)

The empirical frequencies of the ERId(30) samples were calculated using Eq. (2). Part of the empirical frequency results of ERId(30) for the double thresholds [0.26,0.58] were shown in Table 4. The curves of the empirical frequency distribution of ERId(30) for different double-threshold scenarios were shown in Fig. 5. It can be found that the empirical frequency distributions of ERId(30) samples have the similar tendency in different double-threshold scenarios.

Table 4 Empirical frequency of ERId samples
Fig. 5
figure 5

The empirical frequency of the ERId sample data

3.4 Results of the “theoretical” Frequency Distribution

In this study, doubly censored exponential distribution, exponential function, Gumbel distribution, Weibull distribution, and lognormal distribution were selected to fit frequency distribution function of ERId(30). Meanwhile, MCMC algorithm was used to estimate parameters of the above distributions (Liu et al. 2021). Results of parameter optimization were shown in Table 5, and results of the doubly censored exponential distribution were shown in Fig. 6. The comparison of the frequency of fitting each function to ERId was shown in Table 6. It reveals that the doubly censored exponential distribution and the exponential distribution are suitable for characterizing ERId and the Gumbel, Weibull, and lognormal distributions are not suitable for ERId. Among them, the doubly censored exponential distribution is the best theoretical frequency distribution for ERId.

Table 5 Parameter optimization results for doubly censored exponential distribution
Table 6 Comparison of the fitting performance for frequency distribution of ERId(30)
Fig. 6
figure 6

Comparison of the empirical and the theoretical frequency distribution of ERId(30)

3.5 Discussions

In this work, the method for calculating the frequency distribution of ERId was proposed based on the double-threshold occurrence mechanism of CSO. It provides a relatively universal method for predicting the severity of CSO from the specific interceptor well based on the recorded rainfall series. For this method, two critical parameters are needed: the areal unit flow rate of the upstream combined sewer and the downstream interceptor sewer. Generally, they can be obtained from the original design archives of the combined sewer system or the on site measurement.

In addition, the net rainfall process (including depression and infiltration) was temporarily neglected in this study. Normally, it only has impacts on the initial stage of rainfall-runoff process of the tributary catchments and the most intense rainfall process seldom occurs in the early stages of rainfall event. Meanwhile, if more accurate results are expected to get, the actual dry weather flow of the combined sewer system should also be measured.

Moreover, the resulted frequency distribution of ERId (e.g. the doubly censored exponential distribution and the exponential distribution in this work) are site specific. Particularly, it is related to the rainfall data used, the situations of sewer linking the interceptor well, etc. Therefore, more attention should be paid to demonstrate this result in the future research.

Finally, this work is very helpful for evaluating or even predicting CSO behaviour for the specific interceptor well. Based on the frequency distribution of ERId derived from this method and the corresponding CSO duration, the volume of CSO can be calculated under certain frequency conditions, which is useful for rationally design CSO abatement and treatment facilities.

4 Conclusions

In this study, a double-threshold method of excess rainfall intensity (ERId) was proposed for characterizing CSO. For the specific interceptor well, the upper threshold is the design areal unit discharge (the equivalent rainfall intensity) of its upstream combined sewer and the lower one is the design areal unit discharge of its downstream interceptor sewers minus dry weather flow rate. In addition, the frequency distribution of ERId was initially investigated. Finally, a case study was conducted based on a 10-year rainfall time series. Eight sets of double thresholds ([0.26,0.58], [0.33,0.76], [0.40,0.91], [0.49,1.11], [0.58,1.32], [0.65,1.46], [0.73,1.65], [0.83,1.88] mm/min) and one typical CSO durations (30 min) were demonstrated. The results showed that: (1) The excess rainfall intensity of double thresholds (ERId) is suitable for characterizing CSO. (2) The frequency distribution of ERId was right (positively) skewed based on the statistics of ERId samples. (3) The kurtosis of the ERId samples is not greater than 3 for all the thresholds studied, which indicates that the distribution is thin-tailed. (4) In this study, the optimal frequency distribution function for ERId is doubly censored exponential distribution.