1 Introduction

In China, the shortage of water in cities is becoming more serious than before. At present, the total amount of water shortage in China’s cities has reached 6 billion \( m^{3} \). The United Nations 2018 World Water Development Report projected that China’s population is expected to exceed 1.6 billion by 2030, and hence, the water shortage could be even worse by then. The water sector is increasingly shifting focus from demand-side management strategies to improve the water efficiency. As one of the main factors raising the total amount of water use, urban household water consumption is attracting concerns from policymakers (Abu-Bakar et al. 2021). Water-saving management contract (WSMC) is one of the effective ways to improve the efficiency of urban household water supply by assisting water users in reducing water waste (Hu et al. 2021). Demand forecasting is a prerequisite for efficient supply management (Momenitabar et al. 2022; Pu et al. 2023; Liu et al. 2023). A lot of practices have proved that improving the accuracy of demand forecast is one of the effective ways to increase the efficiency of supply management (Momenitabar et al. 2023, 2022). Therefore, how to accurately predict urban household water demand (UHWD) based on historical data is a problem that needs to be solved before the implementation of WSMC project (Taneja et al. 2019).

Traditional UHWD prediction studies have focused on water users and water devices. Per capita approaches are used to calculate total demand based on population and water use per capita. Other approaches take into account the amount of water used at the ends (Yuan et al. 2014). Since extrapolation methods are developed for long-term UHWD prediction, more studies have concerned the responses of UHWD to associated factors, such as income (Qi and Chang 2011), water price (Howe and Linaweaver 1967; Gaudin 2006), family education and age (Lyman 1992), household size (Danielson 1979), dwelling type (Fox et al. 2009), and so forth. Traditional approaches provide preliminary ways for forecasting UHWD; however, these methods analyze UHWD influencing elements at the macro-level and do not account for uncertainty in the UHWD prediction process (Donkor et al. 2014).

Traditional probabilistic statistics, in general, is one of the most important mathematical methods for dealing with the uncertainty of prediction issues. Time series analysis is a frequent forecasting tool, and it illustrates the uncertainty of prediction outcomes based on random variables in probability statistics. Traditional autoregressive model, moving average model, and autoregressive integrated moving average model have been deeply studied and widely applied (Safaei et al. 2022). More time series models have been developed to solve new problems. Evolutionary algorithms were used by some scholars to estimate the parameters of multi-step time series analysis (Kumar et al. 2022). Nonlinear time series analysis has been rapidly developed as complex network theory was improved (Wang et al. 2022; Zhou et al. 2023). In recent years, a hybrid of machine learning and traditional time series analysis has been proposed (Meng et al. 2022; Salloom et al. 2022).

Uncertain statistics is another mathematical technique for characterizing the uncertainty of prediction issues. Compared with probabilistic statistics, uncertain statistics is derived from a different theory. Probabilistic statistics is based on the probability theory, while uncertain statistics is based on the uncertainty theory. The primary difference between those theories is that probability theory stipulates “multiplication” and uncertainty theory is a “minimum” mathematical system. The history of uncertainty theory is nearly 15 years, and it dates back to the work of Liu in 2007. Based on normality, duality, subadditivity axioms (Liu 2007), and product axiom (Liu 2009), Liu established uncertainty theory and extended it into many fields. Now, uncertainty theory has attracted the interest of many scholars including the statistics field.

A prior field of uncertain statistics is uncertain regression analysis that was developed by Yao and Liu (2017). Later, Lio and Liu studied the residual of an uncertain regression model and improved the predicted value to confidence intervals (Lio and Liu 2018). Parameters’ estimation is the crucial step of regression analysis. Since the least-squares method was adopted to estimate parameters, the least absolute deviations’ estimation (Liu and Yang 2020), the Tukey’s biweight estimation (Chen 2020), and the uncertain maximum-likelihood estimation (Lio and Liu 2020) have been developed to improve the parameters’ estimation methods. As for some series data related to time, more useful information can be studied based on the series itself. Yang and Liu proposed the uncertain time series analysis to deal with the above data (Yang and Liu 2019). They described the time series data as uncertain variables and developed an uncertain autoregressive model. Tang developed the uncertain threshold autoregressive model (Tang 2022). Next, to deal with the unknown coefficients of the uncertain autoregressive model, a least-squares method was firstly used. Then, Yang et al. developed a least absolute deviations method (Yang et al. 2020). Later, the analytic solution of the uncertain autoregressive model was proposed by Zhao et al. according to the principle of least squares (Zhao et al. 2020). To determine the order of the uncertain autoregressive model, a cross-validation process was developed by Liu and Yang (2022). To judge whether some hypotheses are correct or not, Ye and Liu presented an uncertain hypothesis test which provides a method to evaluate the fitness of estimation in an uncertain statistics model (Ye and Liu 2022). At present, uncertain time series analysis has been successfully applied to prediction problems such as COVID-19 (Lio 2021) and carbon dioxide emissions (Chen and Yang 2021).

Uncertain statistics can be used to characterize circumstances where random variables fail. When a variable’s distribution function is not close to its frequency, this variable is not random, but uncertain (Liu 2023). For example, it is not suitable to employ probabilistic statistics to deal with the residual of a time series prediction model if it does not fulfill the white noise requirements. In this study, an uncertain time series model was used to analyze and predict the UHWD based on observed series data. Then, the difference between the results of uncertain time series model and of traditional models was obtained. Finally the findings of the paper and the future direction were discussed. The structure of this paper is arranged as follows. In the next section, the fundamentals of time series analysis and uncertain time series analysis are introduced, such as least-squares estimations, residual analysis, the forecast value, and the confidence interval. Section 3 presents a case study of Handan, a Chinese city, and the forecast results are obtained. A discussion is organized in Sect. 4. Section 5 is a summary of the whole text.

2 Preliminaries

In this section, the basic knowledge of time series analysis and the basic concepts of uncertain time series analysis will be introduced.

The theoretical development of time series analysis began with stochastic processes fairly early on. G. U. Yule’s work in the 1920s was the first practical application of autoregressive models to data (Yule 1927). Once we collect the observed data \( X_{1}, X_{2}, \cdots , X_{n} \), the traditional p-order autoregressive model is shown in Eq. (1)

$$\begin{aligned} X_{t} = \sum _{i=1}^{p} \phi _{i}X_{t-i} + \varepsilon _{t}^{'}, \end{aligned}$$
(1)

where \( \phi _{i} \) refers to unknown parameter, \(\varepsilon _{t}^{'}\) is the disturbance term, and it is a random variable subject to white noise. Liu pointed out that there are sequences in practice whose residual frequency predicted by the time series model is not close to the distribution function of \(\varepsilon _{t}^{'}\), that is, white noise is not satisfied (Liu 2023). For such time series, the uncertain time series model should be used. For the observed data \( X_{1}, X_{2}, \cdots , X_{n} \), Yang and Liu (2019) proposed the uncertain autoregressive model according to Eq. (2) and denoted it by uncertain AR(k)

$$\begin{aligned} X_{t} =a_{0} + \sum _{i=1}^{k} a_{i}X_{t-i} + \varepsilon _{t}, \end{aligned}$$
(2)

where \( a_{0}, a_{1}, \cdots , a_{k} \) are unknown parameters, \( \varepsilon _{t} \) is an uncertain disturbance term which is an uncertain variable, and k is the order of the autoregressive model.

The unknown parameters \( a_{0}, a_{1}, \cdots , a_{k} \) in the uncertain AR(k) model (2) are estimated by the least-squares estimations that is to solve the minimization problem

$$\begin{aligned} \min \limits _{a_{0}, a_{1}, \cdots , a_{k}} \sum _{t=k+1}^{n} \left( X_{t}-a_{0}-\sum _{i=1}^{k}a_{i}X_{t-i}\right) ^{2}. \end{aligned}$$
(3)

By solving the above single objective optimization problem (3) , the optimal solution is denoted by \( \hat{a}_{0}, \hat{a}_{1}, \cdots , \hat{a}_{k} \). Then, the fitted AR(k) model is

$$\begin{aligned} X_{t} = \hat{a}_{0} + \sum _{i=1}^{k} \hat{a}_{i}X_{t-i}. \end{aligned}$$

For each index t \( (t=k+1, k+2, \dots , n) \), the tth residual is defined as the difference between the actual observed value and the predicted value, that is

$$\begin{aligned} \varepsilon _{t}^{*} = X_{t} - \hat{a}_{0} - \sum _{i=1}^{k} \hat{a}_{i}X_{t-i}. \end{aligned}$$
(4)

If the uncertain disturbance terms \( \left\{ \varepsilon _{1}, \varepsilon _{2}, \cdots \right\} \) are a sequence of independent and identically distributed uncertain variables, then the expected value of the uncertain disturbance term is estimated by the average of residuals according to Eq. (5)

$$\begin{aligned} \hat{e} =\frac{1}{n-k}\sum _{t=k+1}^{n}{\hat{\varepsilon }}_{t}. \end{aligned}$$
(5)

And the variance of the uncertain disturbance term is estimated as Eq. (6)

$$\begin{aligned} {\hat{\sigma }}^{2} = \frac{1}{n-k}\sum _{t=k+1}^{n}\left( {\hat{\varepsilon }}_{t}- \hat{e}\right) ^{2}. \end{aligned}$$
(6)

Therefore, the estimated disturbance term \( {\hat{\varepsilon }}_{n+1} \) is obviously assumed as an uncertain variable with expected value \( \hat{e} \) and variance \( {\hat{\sigma }}^{2} \). Yang and Liu (2019) suggested that the forecast uncertain variable of \( X_{n+1} \) based on \( X_{1}, X_{2}, \cdots , X_{n} \) is determined by

$$\begin{aligned} \hat{X}_{n+1} = \hat{a}_{0} + \sum _{i=1}^{k}\hat{a}_{i}X_{n+1-i} + {\hat{\varepsilon }}_{n+1}, {\hat{\varepsilon }}_{n+1}\sim \mathcal {N}(\hat{e}, {\hat{\sigma }}), \end{aligned}$$
(7)

and the forecast value of \( X_{n+1} \) is defined as the expected value of the forecast uncertain variable \( \hat{X}_{n+1} \), that is

$$\begin{aligned} {\hat{\mu }} = \hat{a}_{0} + \sum _{i=1}^{k}\hat{a}_{i}X_{n+1-i} + \hat{e}. \end{aligned}$$
(8)

According to (7), (8) and the operational law of uncertain variables, the forecast uncertain variable \( \hat{X}_{n+1} \) has a normal uncertainty distribution \( \mathcal {N}({\hat{\mu }}, {\hat{\sigma }}) \), that is

$$\begin{aligned} {\hat{\Psi }}(z)=\left( 1 + \textrm{exp}\left( \frac{\pi ({\hat{\mu }}-z)}{\sqrt{3}{\hat{\sigma }}}\right) \right) ^{-1}. \end{aligned}$$

Taking \( \beta \) (e.g., \( 95\% \)) as the confidence level, it is verified that (Ye and Yang 2021)

$$\begin{aligned} \hat{b} = \frac{{\hat{\sigma }}\sqrt{3}}{\pi } \ln \frac{1+\beta }{1-\beta } \end{aligned}$$

is the minimum value b, such that

$$\begin{aligned} {\hat{\Psi }}({\hat{\mu }}+b) - {\hat{\Psi }}({\hat{\mu }}-b) \ge \beta . \end{aligned}$$

Since

$$\begin{aligned} \mathcal {M} \{{\hat{\mu }}-\hat{b} \le \hat{X}_{n+1} \le {\hat{\mu }}+\hat{b} \} \ge {\hat{\Psi }}({\hat{\mu }}+b) - {\hat{\Psi }}({\hat{\mu }}-b) = \beta , \end{aligned}$$

Yang and Liu (2019) suggested that the \( \beta \) confidence interval of \( \hat{X}_{n+1} \) is

$$\begin{aligned} {\hat{\mu }} \pm \frac{{\hat{\sigma }}\sqrt{3}}{\pi } \ln \frac{1+\beta }{1-\beta }. \end{aligned}$$

Uncertain time series analysis can be used to predict future value of \( X_{n+1} \) when the previously observed values \( X_{1}, X_{2}, \cdots , X_{n} \) are obtained. An important problem is that whether the forecast uncertain variable (7) of \( X_{n+1} \) is appropriate. The disturbance term in the forecast uncertain variable (7) was assumed as an uncertain variable with normal uncertainty distribution, that is

$$\begin{aligned} {\hat{\varepsilon }}_{n+1} \sim \mathcal {N} (\hat{e}, {\hat{\sigma }}). \end{aligned}$$

Thus, to evaluate the appropriateness of the forecast uncertain variable (7) is to evaluate the appropriateness of the estimated disturbance term \( {\hat{\varepsilon }}_{n+1} \). To describe the appropriateness of \( {\hat{\varepsilon }}_{n+1} \), we consider the hypotheses below

$$\begin{aligned} H_{0}: e = \hat{e} \text { and } \sigma = {\hat{\sigma }} \text { versus } H_{1}: e \ne \hat{e} \text { or } \sigma \ne {\hat{\sigma }}. \end{aligned}$$
(9)

Ye and Liu (2022) suggested that the test for the hypotheses (9) at a given significance level \( \alpha \) (e.g., 0.05) is

$$\begin{aligned} \begin{aligned} W&= \bigg \{ ( z_{k+1}, z_{k+2}, \cdots , z_{n} )\\&\text {: there are at least } \alpha \text { of indexes } t\text {'s with } \\&k+1 \le t \le n \text { such that } z_{t}<\Phi ^{-1} \left( \frac{\alpha }{2} \right) \\&\text { or } z_{t}>\Phi ^{-1} \left( 1-\frac{\alpha }{2} \right) \bigg \}, \end{aligned} \end{aligned}$$
(10)

where

$$\begin{aligned} \Phi ^{-1}(\alpha ) = \hat{e} + \frac{{\hat{\sigma }}\sqrt{3}}{\pi }\ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

If \( (\varepsilon _{k+1}^{*}, \varepsilon _{k+2}^{*}, \cdots , \varepsilon _{n}^{*}) \notin W \), then the forecast uncertain variable (7) of \( X_{n+1} \) is appropriate. Otherwise, this forecast uncertain variable is inappropriate. In this case, reasons for the inappropriate forecast uncertain variable (7) may lie in the estimation method.

3 Uncertain time series analysis for UHWD series cases

Annual UHWD series of Handan city from 1993 to 2017 are shown in Table 1 and Fig. 1. The data were collected from the Li’s studies (Li 2020).Footnote 1

Table 1 UHWD series data of Handan from 1993 to 2017 \( (10^{4} m^{3}) \)
Fig. 1
figure 1

UHWD series data of Handan from 1993 to 2017

We use \( X_{1}, X_{2}, \cdots , X_{25}\) to denote the above water demand series, that is, for each t \( (t=1, 2, \cdots , 25) \), \( X_{t} \) represents the UHWD of Handan during year t after 1992. For instance, \( X_{1} \) refers to the UHWD of Handan during 1993, and \( X_{25} \) refers to the UHWD of Handan during 2017. We employ the fixed origin cross-validation to determined the optimal order k in the autoregressive model AR(k) (Liu and Yang 2022). Based on the observed data, we consider the orders from 1-order to 6-order, then the results show that \( k=1 \) is the optimal order. Thus, through calculation process that compiled by the software MATLAB R2014a, we obtain the fitted AR(1) model is

$$\begin{aligned} X_{t} = 679.4 + 0.8932X_{t-1}. \end{aligned}$$
(11)
Fig. 2
figure 2

Residuals of AR(1) model (10)

Figure 2 shows the residuals of model (11) . From Fig. 2, the residual plot is unlike from the same population. Hence, it is needed to test that whether the samples are from a same population or not. The residuals of model (11) are also shown in Table 2.

Table 2 Residuals in uncertain AR(1) model (11)

We select two groups of samples from the residuals and two-sample Kolmogorov–Smirnov test is applied to testing the above groups of two samples. The result shows that the above selected two groups of samples are not from the same population at the \( 5 \% \) significance level and the asymptotic p value is 0.0111. Hence, the residuals are not from the same population. This result indicates that the estimated distribution function is not close enough to the frequency, which means that the disturbance term is not a random variable. In this case, we can use uncertain variable to describe the disturbance term and apply the uncertain time series analysis to predicting the UHWD. Based on the residuals in Table 2, we can estimate the expected value and variance of the disturbance term. Then, the forecast uncertain variable of \( X_{26} \) (UHWD of Handan in 2018) based on \( X_{1}, X_{2}, \cdots , X_{25} \) is determined by

$$\begin{aligned} \hat{X}_{26} = 679.4 + 0.8932X_{25} + {\hat{\varepsilon }}_{26}, \quad {\hat{\varepsilon }}_{26} \sim \mathcal {N}(0.000, 743).\nonumber \\ \end{aligned}$$
(12)

To test whether the forecast uncertain variable (12) is appropriate, we consider the following hypotheses:

$$\begin{aligned} H_{0}: e = 0.000 \text { and } \sigma = 743 \text { versus }\nonumber \\ H_{1}: e \ne 0.000 \text { or } \sigma \ne 743. \end{aligned}$$
(13)

Given a significance level \( \alpha =0.05 \), we have

$$\begin{aligned} \Phi ^{-1} \left( \frac{\alpha }{2}\right) = -1500.9, \quad \Phi ^{-1} \left( 1 - \frac{\alpha }{2}\right) =1500.9, \end{aligned}$$

where \( \Phi ^{-1} \) is the inverse uncertainty distribution of \( \mathcal {N}(0.000, 743) \), that is

$$\begin{aligned} \Phi ^{-1}(\alpha ) = 0.000 + \frac{743\sqrt{3}}{\pi }\ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

Since \( \alpha \times 24 = 1.2 \), it follows from (10) that the test for the hypotheses (13) is:

$$\begin{aligned} \begin{aligned} W&= \{ ( z_{2}, z_{3}, \cdots , z_{25} )\\ {}&\quad \text {: there are at least } 2 \text { of indexes } t\text {'s with } \\&\quad 2 \le t \le 25 \text { such that } z_{t}<-1500.9 \text { or } z_{t}>1500.9 \}. \end{aligned} \end{aligned}$$
(14)
Fig. 3
figure 3

Residuals of uncertain AR(1) model and the test

As shown in Fig. 3, since all the residuals fall into the interval \( [-1500.9, 1500.9] \), we have \( (\varepsilon _{2}^{*}, \varepsilon _{3}^{*}, \cdots , \varepsilon _{25}^{*}) \notin W \). Thus, the forecast uncertain variable (12) of \( X_{26} \) based on \( X_{1}, X_{2}, \cdots , X_{25} \) is appropriate. Thus, the forecast value of \( X_{26} \) is the expected value of \( \hat{X}_{26} \), that is

$$\begin{aligned} {\hat{\mu }} = 7203, \end{aligned}$$

and the \( 95 \% \) confidence interval is

$$\begin{aligned} 7203 \pm \frac{743\sqrt{3}}{\pi } \ln \frac{1+0.95}{1-0.95}, \end{aligned}$$

i.e., \( 7203 \pm 1500.9 = [5702.1, 8703.9] \).

4 Discussion

4.1 Compared with the traditional methods

In this section, we discussed the comparison between the results of this paper and the one of traditional methods.

(1) The samples are not from the same population and random time series analysis is not applicable. The residuals of the estimated model are not from the same population, and hence, the disturbance term should not be dealt with a random variable. The amount of water demand in an area is subject to many uncertainties, such as population, economics, and environment. Many researches point out that the above-mentioned uncertainties are not suitable to be described by probabilistic methods (Keilman 2020; Sun et al. 2018; Alamanos et al. 2020). The fact that disturbance term samples are not from the same population shows that the distribution function is not close enough to the observed frequency. In this case, uncertainty theory is a better tool to deal with the residuals than probability theory. Then, the disturbance term should be regarded as an uncertain variable.

(2) Li (2020) offered two approaches for predicting UHWD: LASSO-SVMR and MSR-LULB, with MSR-LULB providing prediction results in the form of intervals. Table 3 shows a comparison of the aforementioned two methodologies and the predicted outcomes in this paper.

Table 3 Comparison between the traditional method and the predicted results in this paper (unit: \( 10^4 m^3\))

As shown in Table 3, the predicted value bias of uncertain time series model is less than that of the LASSO-SVMR method, while the prediction interval deviation is larger than that of the MSR-LULB method. However, the prediction interval of the approach utilized in this study covers the actual value, while the actual value surpasses the MSR-LULB method’s prediction interval. Although this approach produces a closer prediction interval, missing the actual value might result in incorrect findings. Hence, for the WSMC project decision-makers, the UHWD data produced by uncertain time series analysis can yield superior results than traditional approaches. Meanwhile, compared with LASSO-SVMR and MSR-LULB methods, uncertain time series model is easy to use based on UHWD data.

4.2 Analysis of the UHWD prediction of Handan

Handan’s predicted UHWD value in 2018 is higher than in 2017. However, the scale of the increase is small, and the trend of Handan’s annually UHWD is negative. The following discusses attempt to investigate the causes behind the appearance of the above Handan UHWD trend.

According to residential water pricing statistics, water price has no discernible affect on household water use in Handan City. Since October 1, 2006, water prices have risen from 1.8 yuan\( / m^{3} \) to 2.95 yuan\( / m^{3} \), and until June 1, 2017, it has increased to 4.22 yuan\( / m^{3} \). In contrast, the UHWD of Handan City began to decline between 2006 and 2017. As a result, the adjustment in water price had no effect on UHWD in Handan.

Handan’s urban population expanded consistently and modestly between 1993 and 2013. Household water use per capita has decreased dramatically throughout this time period, though it has recently increased marginally. As a result, population growth has not resulted in an increase in Handan UHWD. From 1993 to 2017, we collected data on Handan’s household water use per capita. The findings of assessing the linear connection between home water consumption per capita and Handan’s total UHWD demonstrate that the aforementioned two series have a strong linear association with a correlation coefficient 0.9098. The correlation weakened from a stronger case after the urban area changed in 2016. As a result, the decrease in water use per capita is the primary cause of Handan’s UHWD.

In addition to the influencing elements such as climate and economy, one of the key explanations for why the UHWD of Handan decreased might be increased understanding of water-saving practices among residents. One of Handan’s main water sources is the South-to-North Water Diversion project. The price of raw water from the South-to-North Water Diversion project is 2.51 yuan\( / m^{3} \), a big increase from 0.3 yuan\( / m^{3} \) of raw water from Yuecheng Reservoir. There were some outstanding problems of the South-to-North Water diversion project, such as the high cost of water use and insufficient water consumption. To effectively solve the above problems, the Hebei provincial government issued a policy to improve the efficiency of household water consumption in 2016. This policy also aimed to enhance the awareness of water conservation in the whole society.

Another major cause might be the WSMC’s spreading and driving influence of water-saving awareness. In December 2014, Beijing Cathay water-saving service company signed an agreement with Hebei University of Engineering (HUE) to start the WSMC project. HUE is located in Handan and its WSMC project saves about \( 45 \% \) water consumption every year. Students, teachers, and citizens are all aware of the significance of conserving water. This phenomena has a significant beneficial impact on urban water conservation.

5 Conclusion

UHWD prediction is one of the important steps to make an optimal water-saving strategy. Based on the observed series data of Handan, an uncertain time series model was presented in this paper to predict the UHWD. The result shows that the residuals are not from the same population and the estimated distribution function of the disturbance term is not close enough to the observed frequency. Hence, the disturbance term should be dealt with as an uncertain variable rather than a random variable for the water demand data. At the same time, compared with the traditional UHWD prediction models, the prediction accuracy of the uncertain time series model is higher and it is easier to use. As a result, the uncertain time series analysis is suitable for predicting Handan city’s UHWD. In future research, more types of uncertain time series models can be studied to meet different forecasting needs in theory. In practice, the uncertain time series model can be applied to predict and analyze the time series data of more scenarios to improve the prediction accuracy.