Introduction

In any water distribution system water demand can be represented as the driver, and an effective water demand forecasting aids to accomplish effective water management in any urban metropolis. Accurate water demand forecasting helps in better urban planning and sustainable development [1, 2]. Water demand in a city is influenced by many factors such as the population of the city, various industrial establishments in the city, climatic conditions [3, 4]. Therefore, predicting accurate water demand in an urban city is a challenging task. Water demand forecasting can be classified into three types based on the time prospect considered: short-term, medium-term, and long-term forecasting [5]. The medium- and long-term water demand forecasting may assist in planning and development of the reservoir systems and water network [3, 6]. Short-term water demand forecasting also called as operational time forecast provides an estimate of water demand over short horizons ranging from few days to weeks. Short-term demand forecasts enable real-time management of water distribution network devices.

Traditionally, researchers used various techniques to model water demand such as regression, autoregressive integrated moving averages (ARIMA), artificial intelligence (AI) models [7,8,9,10]. Conventional techniques, particularly linear regression-based models are known for their simplicity in understanding and implementation. They have been the primary focus of research and extensively applied in practical applications [11]. On the other hand, they appear to lack accuracy in forecasting water demand, which can create serious problems in the smooth running of the network system. Moreover, changes in water demand exhibit nonlinearity due to non-linear changes in water consumption, temperature variations, and holidays impacting urban water usage [12] and cannot be accurately predicted by linear approaches [13]. In the realm of water demand forecasting, some of the AI models have been explored in previous studies such as artificial neural network (ANN) [5, 6], support vector machines (SVM) [13,14,15,16], SVM method using Fourier method [14], random forests [15], extreme learning machines (ELM) [15, 17], employed ELM in conjunction with wavelet based ANN [18], system dynamics modeling (SDM) [19], ensemble wavelet–bootstrap machine-learning approach [20], singular spectrum analysis coupled with neural networks [21], and adaptive neuro-fuzzy inference system (ANFIS) [22]. In general, these learning techniques have shown successful results and are widely applied in water demand forecasting.

However, the growth of urbanization and the consequences of climate change causes considerable uncertainty in the time-series data making forecasting more difficult, which inspiring experts to improve and develop new models including the use of deep learning (DL) techniques. In contrast to the traditional method for water demand forecasting, DL can take previous data and automatically recognize trends through consecutive layer representation, and layers leading to the development of the model which is also known as hierarchical representation learning [23]. In recent years, DL techniques have shown remarkable and promising results in many fields such as sentiment analysis [24, 25], facial recognition [26, 27], natural language processing [28], weather prediction [29], and time-series analysis [30]. A DL network allows the usage of hundreds or even thousands of hierarchical layers, and it learns autonomously from past data [31]. CNN network can very well extract the features in urban water consumption dataset while being used in urban water demand forecasting problem [32].

The incredible achievements of DL modes in various fields have inspired researchers to investigate and implement the DL model for forecasting water demand. Reviewing subsequent and more recent work on deep learning for water demand forecasting, we found that Guo G et al. [33] used the gated recurrent unit network (GRUN) model to forecast short-term water demand and Salloom T [34] used the GRUN for real-time water demand forecast in China. Study also used long short-term memory (LSTM) neural networks combined with wavelet transform and principal component analysis for daily urban water demand prediction [35] and the deep belief network (DBN) technique was used for modeling daily urban water demand [36]. Moreover, researchers applied a continuous deep belief echo state network (CDBESN) for hourly urban water demand forecasting and found the developed model outperforms the echo state network (ESN) and SVR models used in the study [37]. The recent study also addressed the challenge of predicting household water consumption for four different water use types (apartment, restaurant, detached house, and elementary school) with non-linear patterns (weekdays and weekends as explanatory variables) using deep learning-based LSTM models [38]. Furthermore, the study by Kavya M [39] focused on short-term water demand forecasting using nine machine learning and deep learning models. The results indicate that deep learning models, particularly the LSTM model, outperform machine learning models in both univariate and multivariate scenarios. The outcomes were compared with the artificial neural network (ANN) model, support vector regression (SVR) model, and conventional autoregressive integrated moving average (ARIMA) model. Researchers stated that the LSTM gained huge attraction in the field of time-series forecasting, particularly bi-directional LSTM (BiLSTM) introduced an architecture for better learning [40] and the prediction error is reduced using the hybrid CNN–BiLSTM model [12]. Moreover, the hybrid model can incorporate historical water data and climatic factors, resulting in improved prediction accuracy compared to other stand-alone models (such as LSTM, BiLSTM, CNN, GRUN, and ANN). In addition, it demonstrates shorter training time and convergence in comparison to other models [12].

The objectives in the present study are: (i) to evaluate the predictive efficacy of DL models, namely CNN, LSTM, BiLSTM models and (ii) to develop hybrid models CNN–LSTM and CNN–BiLSTM that combines the advantages of stand-alone models (CNN and LSTM) and bidirectional approach, respectively. To the best of our knowledge, the potential of the hybrid CNN–LSTM and CNN–BiLSTM deep learning model for water demand prediction has not been explored yet. Further, the uncertainty analysis has been done for all the models used in this study for better evaluation of model limitations.

Study Area and Data Collection

The daily water consumption datasets for the London city in Canada (study area located in Fig. 1) have been used from 1st July 2009 to 2nd September 2020 for development of urban water demand forecasting models. The daily water demand data are computed from water billing data and provided by the Municipal Artificial Intelligence application lab out of the Information Technology Services division (https://github.com/aildnont/water-forecast). The variation of the daily water consumption in the city can be observed from Fig. 2 which reaches a peak between March and November with few exceptions during the year 2014 and beginning of 2017. The descriptive statistics of the water demand data from the city of London, including minimum, maximum, mean, and standard deviation can be found in Table 1.

Fig. 1
figure 1

Time series of the daily water demand for the city of London, Canada

Fig. 2
figure 2

Study area map showing the city of London, Canada

Table 1 Descriptive statistics of the water use demand data used in the London city of Canada

Methodology

This section describes the state-of-the-art deep learning methods which have become the proposed solutions for time-series forecasting in water resources variables related to water quantity.

Neural Network and Its Forms

The state-of-the art deep learning models have been evolved from the age-old artificial neural network [41] with different levels of complexity in structure, parameterization and run-time consumption. The models are subsequently evaluated for 1 day to 15 days ahead of water demand forecasting. The artificial neural networks are based on learning the non-linear input–output relationship existing in the observed parameters through weighted synaptic connections [42] consisting of input layer, one or more processing layers, and an output layer represented by nodes/neurons The model performance depends on historical data used for training and the method for determining the weights and functions for inputs and nodes during training [43]. The neuron has an internal activation level in the hidden layer called the activation function/transfer function which establishes a relationship between the weighted inputs and the outputs.

Furthermore, the structure of recurrent neural network (RNN) is meant to handle sequential data sample or ordered data in which subsequent things relate and follow each other. RNN is having an internal state (memory) that allows accepting the short- or long-term outputs to be used as inputs [44]. RNNs can process input data sequences using their internal hidden memory, and as a result, they can be used for applications such as handwriting identification, speech recognition, and time-series analysis. In RNN, the information cycles through a loop make the inputs related to each other. This helps RNN to examine the input and output sequences while making a decision. Figure 3 shows an unfolded architecture of simple RNN and the mathematical expressions are as follows:

Fig. 3
figure 3

Typical recurrent neural network structure

xt is the input sequence at a time t. st is the hidden memory at the time t, which is estimated based on the current input and the preceding hidden memory. st can be formulated as

$$S_{t} = f(U_{{x_{t} }} + WS_{t - 1} )$$
(1)

where f is the activate function, which acts as a non-linear transformation that results in transferring the input sequence before sending it to the next layer of neurons or concluding it as output. There are different types of activation functions used in deep learning such as sigmoid function, step function, tanh, and rectified linear unit (ReLU). The initial hidden memory s0 for estimating the first hidden memory s1 is usually considered 0. ot is the outcome at a time t and is calculated by

$$o_{t} = f(VS_{t} )$$
(2)

U, V and W are the weights of the hidden layer, the output layer, and the hidden memory, correspondingly. xt and ot are the input and outcome sequences at the time t, respectively.

Despite the advantages of using RNN, it has some disadvantages, such as vanishing gradient and exploding gradient problems [45, 46] and inefficiency of RNN to handle much older past information even using a different activation function.

Long Short-Term Memory (LSTM) as a Form of RNN

LSTM is an upgraded kind of RNN that was designed to model sequential data and their long-term dependencies more precisely than the traditional RNN. The LSTM was designed by Hochreiter, Schmidhuber [47]. It has been so designed that it can overcome the vanishing gradient problem. The LSTM architecture consists of a memory cell and three multiplicative gating units. An input gate enhances information to the cell state memory, a forget gate, which eliminates the data that are no longer essential to the model, and an output gate, which picks the information to be shown as output. Figure 4 shows a simple LSTM architecture.

$$f_{t} = \sigma (W_{f} \times [h_{t - 1} ,x_{t} ] + b_{f} )$$
(3)
$$i_{t} = \sigma (W_{i} \times [h_{t - 1} ,x_{t} ] + b_{i} )$$
(4)
$$\tilde{C}_{t} = \tanh (W_{C} \times [h_{t - 1} ,x_{t} ] + b_{C} )$$
(5)
$$C_{t} = f_{t} C_{t - 1} + i_{t} \tilde{C}_{t}$$
(6)
$$o_{t} = \sigma (W_{o} \times [h_{t - 1} ,x_{t} ] + b_{o} )$$
(7)
$$h_{t} = o_{t} \tanh (C_{t} )$$
(8)
Fig. 4
figure 4

LSTM architecture

In the above equations, all weights (W) and biases (b) are learned during the model development period.

One-Dimensional Convolutional Neural Network (1D-CNN)

Convolutional Neural Network (CNN) is a DL neural structure generally used for image recognition [48, 49]. A CNN is made up of a series of convolutional layers, the output of which is only linked to local regions in the input, allowing the network to learn filtering that detects certain patterns in the input data [45, 49]. An architecture of CNN for one-dimensional time series for water demand prediction has been used in this current study (Fig. 5).

Fig. 5
figure 5

CNN architecture for water demand prediction model

A 1D-CNN consists of a convolutional hidden layer that works over a one-dimensional (1D) sequential input layer. Under the situations such as very long input sequences, a second convolution layer is formed and then a pooling layer is generated following the 1D sequential layer whose function is to refine the convolution layer outputs to the most relevant input variables. A dense fully connected layer follows the convolutional and pooling layers which interprets the features extracted by the convolutional layer of the model. A flatten layer is used between the convolutional layers and the dense layer to reduce the feature maps to a single one-dimensional vector. In this study we have developed a convolutional layer with 64 filters, the kernel of size 2, and the Rectifier linear unit (ReLU) as the activation function. The input feature is then inferred by a max-pooling layer and a dense layer; and finally, the output layer predicts a single numerical value. The model is built-in with the efficient Adam optimization algorithm and the mean-squared error, or ‘MSE,’ is considered as the loss function.

BiLSTM Model

The conventional LSTM works with the one-directional data processing which has been the factor for reduced efficiency of LSTM model in prediction of water demand [12]. However, multidirectional data may contain valuable information. To overcome these challenges, the bi-directional LSTM (BiLSTM) incorporates the sequential information from both forward and reverse directions in the dataset, as depicted in Fig. 6.

Fig. 6
figure 6

BiLSTM model architecture

This means that the past data information of input data sequence is received by forward LSTM whereas future data points are achieved by reverse LSTM [50]. The BiLSTM architecture is incorporated with forget gate structure that is similar to LSTM architecture but the assigned weights and biases in BiLSTM increases two times in comparison with LSTM. In each directions, datasets are trained separately during the training process and finally fused by integrating the outputs as expressed as follows:

$${O}_{t}=\overrightarrow{{O}_{t}} \int {\overleftarrow{O}}_{n-t+1}$$
(9)

where \(\overrightarrow{{O}_{t}}\) and \({\overleftarrow{O}}_{n-t+1}\) are the output of forward and backward directions, respectively; \(\int .\) is used for integration operator; and \({O}_{t}\) is the predicted output at time t

Hybrid CNN–LSTM and CNN–BiLSTM Models

When advanced ML models such as CNN, LSTM and BiLSTM are used individually, the network structure have their specific advantages. Previous studies suggest that CNN and LSTM (as well as BiLSTM) models could be better suitable stand-alone model for developing a hybrid approach [12]. The long short-term memory (LSTM) model has been evolved to solve the long-term data dependencies due to its gradient vanishing and exploding issues. In this study, hybrid form of DL models named as CNN–LSTM and CNN–BiLSTM have been developed by integrating the advantages of these stand-alone models to predict water demand in different cities of Canada. The CNN–LSTM model consisted of the upper layer with the CNN input layer with several hidden layers and an output layer which extracts features to be given as input to LSTM cells. The hidden layer typically consists of a convolution layer that has already been discussed in the CNN model under “Hybrid CNN–LSTM and CNN–BiLSTM models”. The topological architecture of the CNN–LSTM model is presented in Fig. 7a.

Fig. 7
figure 7

a Hybrid CNN–LSTM and b CNN–BiLSTM models

The other hybridized form of CNN–BiLSTM model combines the CNN with BiLSTM that merges the benefits from both stand-alone CNN and BiLSTM models. The input vector for CNN model contains the water demand at different time lags and outcome of the CNN model is later fed to the BiLSTM network, and finally, the fully connected layers generate the predicted output. The complete architecture of the proposed CNN–BiLSTM model to forecast water demand in the given framework is shown in Fig. 7b.

Model Development and Hyperparameters Optimization

The stand-alone as well as hybrid models have been developed using Python 3.6.10, Tensorflow 2.1.0 version and Keras deep learning library in Windows 7 [51]. While developing a deep learning model, it is crucial to select a suitable loss function as well as an optimizer. A loss function (or objective function) is one of the essential parameters required to compile a model and to evaluate how well the deep learning procedure fits the observed and predicted variable. There are different types of loss functions such as the cross-entropy loss, mean-squared error (MSE), Huber loss, and hinge loss. However, most of the studies use MSE as a loss function to train deep learning models used in regression or prediction tasks. We trained the 1D-CNN and LSTM model with Adaptive Moment Estimation (Adam), stochastic gradient descent (SGD), Root Mean Square Propagation (RMSprop), Adaptive Gradient algorithm (AdaGrad), wherein the Adam being an adaptive learning rate optimization procedure performed the best among other optimization algorithms. Tuning the hyperparameters of any deep learning model is a vital part of any model for better input–output mapping. We tuned the hyperparameters of each model based on their Keras packages. Out of the available data from 2009 to 2020, 70% data is used for training, and the rest data are equally divided for validation and testing purposes. Each model has a batch size of 10 and is trained for 500 epochs. As input selection is one of the cumbrous and essential tasks while working with model development, we will follow the input combination through autocorrelation and partial-autocorrelation analysis for the time series. Further, all the DL models used in this study have been developed using the optimal input combination.

Model Performance Evaluation

In the study, the performance of the developed models was evaluated using various statistical measures such as Correlation coefficient (r), Mean Absolute Error (MAE), Nash–Sutcliffe Efficiency (NSE), Scatter Index (SI), Mean Bias Error (MBE), and Discrepancy Ratio (DR) [52,53,54]:

$$r = \frac{{\sum\nolimits_{i = 1}^{N} {\left( {{\text{WD}}_{{{\text{Obs}}}} - \overline{{{\text{WD}}_{{{\text{Obs}}}} }} } \right)\left( {{\text{WD}}_{{{\text{Pred}}}} - \overline{{{\text{WD}}_{{{\text{Pred}}}} }} } \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{N} {\left( {{\text{WD}}_{{{\text{Obs}}}} - \overline{{{\text{WD}}_{{{\text{Obs}}}} }} } \right)}^{2} } \sqrt {\sum\nolimits_{i = 1}^{N} {\left( {{\text{WD}}_{{{\text{Pred}}}} - \overline{{{\text{WD}}_{{{\text{Pred}}}} }} } \right)}^{2} } }}$$
(10)
$${\text{MAE}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {{\text{WD}}_{{{\text{Obs}}}} - {\text{WD}}_{{{\text{Pred}}}} } \right|}$$
(11)
$$\mathrm{MBE}=\left(\frac{1}{n}\right)\sum_{i=1}^{n}{(\mathrm{WD}}_{\mathrm{Pred}}-{\mathrm{WD}}_{\mathrm{Obs}})$$
(12)
$${\text{NSE}} = 1 - \left[ {\frac{{\sum\nolimits_{i = 1}^{N} {({\text{WD}}_{{{\text{Obs}}}} - {\text{WD}}_{{{\text{Pred}}}} )^{2} } }}{{\sum\nolimits_{i = 1}^{N} {({\text{WD}}_{{{\text{Obs}}}} - \overline{{{\text{WD}}_{{{\text{Obs}}}} }} )^{2} } }}} \right]$$
(13)
$$\mathrm{SI}=\sqrt{\frac{\left(\frac{1}{n}\right)\sum_{i=1}^{n}{\left({(\mathrm{WD}}_{(\mathrm{Pred})}-\overline{{\mathrm{WD} }_{(\mathrm{Pred})}}\right)- \left({\mathrm{WD}}_{(\mathrm{Obs})}-\overline{{\mathrm{WD} }_{(\mathrm{Obs})}})\right)}^{2} }{\left(\frac{1}{n}\right)\sum_{i=1}^{n}{\mathrm{WD}}_{(\mathrm{Obs})}}}$$
(14)
$$\mathrm{DR}=\left[\left(\frac{1}{n}\right)\sum_{i=1}^{n}\frac{{\mathrm{WD}}_{\mathrm{Pred}}}{{\mathrm{WD}}_{\mathrm{Obs}}}\right]-1$$
(15)

where WD(Obs) is the observed water demand, WD(Pred) is the predicted water demand, \(\overline{{WD_{{\left( {{\text{Obs}}} \right)}} }}\) and \(\overline{{WD_{{\left( {{\text{Pred}}} \right)}} }}\) are the mean of the observed and predicted water demand and N is the number of data points.

Uncertainty in Water Demand Utility

The probabilistic forecast would be important in management of water distribution systems involving the confidence level in the advance prediction of water quantity variables. The uncertainty analysis of any model outputs can be made using quantile regression analysis. The predictions of water demand variables in normalized form can be represented in a linear relationship with the residual as follows [55, 56]:

$$\mathrm{NR}=m\times \mathrm{ND}+c$$
(16)

where m and c are the slope and intercept parameters. While doing the regression analysis, the sum of the weighted absolute residuals is minimized using an objective function,

$$\mathrm{min}\sum_{i=1}^{n}{\rho }_{\varepsilon }\left|{\mathrm{NR}}_{\varepsilon }-\left({m}_{\varepsilon }{\mathrm{ND}}_{\varepsilon }+{c}_{\varepsilon }\right)\right|$$
(17)

where ε is the desired quantile, and \({\rho }_{\varepsilon }\) is the quantile regression function for adjustment of NRε.

Quantile regression was carried out using the programming functions ‘quantilereg’, in the MATLAB package. Using the calibration dataset, the regression between NR and ND at the quantiles of ε = 95%, 5%, 75% and 25% were analyzed to obtain the regression lines for NRε. The m and c parameters were imposed on forecasted discharge to obtain the residuals in the Gaussian domain, NRε at different quantiles. The combination of estimated error quantile in the original domain, Rε and the forecasted WD was obtained as follows:

$${\mathrm{WD}}_{\varepsilon }=\mathrm{WD}+{R}_{\varepsilon }$$
(18)

This finds the relationships at desired quantiles for the calibrated data in the original domain; and it can be imposed on any forecasted value by means of linear interpolation or, if forecasted values are found outside the domain of the calibration dataset, with linear extrapolation. Moreover, similar models were derived for several lead times forecasts by the developed models.

Results and Discussion

With the availability of time series of water demand for 12 years, it was first taken towards check of data stationarity. The Augmented Dickey Fuller test, a form of Dickey Fuller test [57], was conducted (ADF Statistic: − 5.913) which indicated the p value < 5% (2.605e−07) and the null hypothesis that the data are non-stationary was rejected. Hence, without doing any pre-processing, the existing correlation in the univariate time series of daily water demand was analyzed to find autocorrelation and partial autocorrelation in the time series with respect to time-lag (Fig. 8). The reducing trend of autocorrelation and the partial autocorrelation with 95% confidence bound indicated the dominance of autoregressive (AR) component in the dataset. Hence, the previous day water demand data for 4 days lags were used as the inputs, i.e., from WDo (t − 1) upto WDo (t − 4) into the forecasting models. WDo is the observed water demand for different time-lag, and WDf will be the notation for water demand forecast from the model.

Fig. 8
figure 8

Autocorrelation and Partial autocorrelation analysis for the water demand time series

Strategic Modeling for Water Demand Forecasting

Initially, the models are trained with observed daily water demand of day 1 WDo (t) for an output water demand for next day i.e., WDf (t + 1). Further, the models are trained and tested for forecasting at WDf (t + 1), WDf (t + 7) and WDf (t + 15) days which makes the short-term demand forecasts (week advance) essential for management of water network systems.

The model training was conducted with hyperparameter tuning for the CNN, LSTM, BiLSTM and their hybrid models. The time series of model training data and corresponding prediction can be found in Fig. 9 (only CNN–BiLSTM is included). The time series of observed and forecasted water demand during testing phase for both stand-alone CNN and LSTM models are shown in Fig. 10. Figure 11 specifically displays the time series for CNN–BiLSTM models at different lead times throughout the testing period. The forecast accuracy achieved by the developed stand-alone models using training, validation and testing dataset are summarized in Table 2. On the other hand, Table 3 provides the forecast accuracy for the hybrid models. Furthermore, performance measures such as SI, MBE, and DR were utilized to accurately assess the prediction of water demand across all the models [54, 58]. The models consistently performed well during training, but in contrast, the CNN model showed efficiency < 90% during 15-day lead. The other models captured the continuous events of daily water demand with NSE > 90% even up to 15-day lead forecasting mode in terms of peak demand and time of the event.

Fig. 9
figure 9

Training performance of the CNN–BiLSTM model

Fig. 10
figure 10

Visualization of forecasting performance of LSTM over the CNN model

Fig. 11
figure 11

Improved forecasting performance of CNN–BiLSTM model at different lead times

Table 2 The stand-alone model performance for water demand forecasts
Table 3 The hybrid model performance for water demand forecasts

From tables, it is observed that the observed values are closely captured during testing at 1-day-ahead forecasting in case of all the stand-alone DL models. The performance of all stand-alone DL models is as follows: MAE = 0.487 ML/day, NSE = 99.41%, r = 0.998, SI = 0.563, MBE = − 5.288, and DR = 0.049 for CNN model, MAE = 0.537 ML/day, NSE = 99.46%, r = 0.999, SI = 0.064, MBE = − 0.49, and DR = 0.004 for LSTM, and MAE = 0.514 ML/day, NSE = 99.44%, r = 0.999, SI = 0.065, MBE = − 0.469, and DR = 0.004 for the BiLSTM model. During the testing period, the performance of the hybrid models show MAE = 0.320 ML/day, NSE = 99.734%, r = 0.999, SI = 0.055, MBE = 0.101, and DR = − 0.001 for CNN–LSTM model; and the CNN–BiLSTM model achieved a performance with the following indices: MAE = 0.245 ML/day, NSE = 99.830%, r = 0.999, SI = 0.064, MBE = -0.209, and DR = 0.002. The performance of the models based on NSE varies in the order as follows: CNN–BiLSTM > CNN–LSTM > CNN > BiLSTM > LSTM for training. During validation, the order is CNN–LSTM > LSTM > BiLSTM > CNN–BiLSTM > CNN, and it is CNN–BiLSTM > CNN–LSTM > LSTM > BiLSTM > CNN in testing phase. All the predictive models considered herein are reliable as the NSE values are in the acceptable range of 0.75–1. Moreover, the hybrid model (CNN–LSTM) exhibit improved SI and DR compared to the stand-alone models indicating more precise evaluation of water demand forecast model by the hybrid approaches.

Afterwards, the input scenario with WD (t-1) to WD (t-4), i.e., water demand of 1 day to 4 days before were considered in building and re-training the models. All the selected models were customized for multi-step ahead water demand forecasting and the lead-time water demand as output were simulated using the trained models. In this forecasting task, we have summarized the models for 1-day, 7-day and 15-day lead-time water demand prediction (see Tables 2 and 3). The forecasting performance of the CNN model is not significant due to loss of its data capturing capacity as the model has no memory structure, although it has strong feature selection layers. However, the recurrent-type network of LSTM, BiLSTM as well as the hybrid models with CNN combines the memory structure and performs well in the forecasting problems. For 7-day-ahead water demand forecasts, the NSE of the CNN model reduced to 65.34%, and in case of the other models the NSE values are: for LSTM = 95.51%, for BiLSTM = 95.63%, for CNN–LSTM = 95.641%, and for CNN–BiLSTM = 95.655%. The stand-alone BiLSTM model is almost equally efficient with the CNN–LSTM hybrid model in case of 7-day- and 15-day-ahead forecast which indicates the robustness of the BiLSTM structure as a stand-alone multi-step forecasting model. The CNN–BiLSTM model produced 15-day forecast with NSE = 84.843% followed by BiLSTM > CNN–LSTM > LSTM > CNN models. The MAE of the forecasts by the CNN–BiLSTM model ranges as 0.245–2.541 ML/day for 1- to 15-day lead which reflects the consistency of the model to long-lead as the model keeps the memory layers updated. The MAE for other models for 1–15 days forecasts range as: CNN–LSTM: 0.320–2.483 ML/day, BiLSTM model: 0.514–2.518 ML/day, LSTM: 0.537–2.519 ML/day, and CNN: 0.487–5.202 ML/day. Similarly, the SI of CNN–LSTM and CNN–BiLSTM models ranges from 0.055 to 0.271 and 0.064 to 0.272, respectively. It can be observed that SI values consistently increase from the 1-day forecast to the 15-day forecast period. This trend is indicative of a higher level of scatter in the data as the forecast horizon extends that can be interpreted from the uncertainty plots (Fig. 12 (b–d)). The results are in-line with the suggestion by Hu et al. [12] as the hybrid approach of CNN and LSTM/BiLSTM models produced satisfactory results in forecasting. In order to assess the uncertainty of the models for lead-time forecasting, quantile regression method has been applied using the training errors to find the parameters (Fig. 12a). The range of quantiles between 0.95 and 0.05 indicate a 90% CI band; and similarly, the range between 0.75 and 0.25 quantiles indicate 50% CI band.

Fig. 12
figure 12

a Plot of residuals vs forecasts during model training and bd uncertainty plots for the CNN–BiLSTM forecasting

For representing the resulting uncertainty in streamflow forecasting (for testing results), the forecasts are shown in the form of 90% CI and 50% CI in the area plot; and the corresponding observed water demand time series are shown in red points (Fig. 12b–d). The average width of confidence interval bands is an indicator of the level of uncertainty. It can be observed that the width of CI bands is very narrow at 1-day ahead forecast due to the accurate prediction of the next day water demand forecasting. At higher lead-time modeling by the CNN–BiLSTM model, though the width of the CI band increases, it accommodates most of the observed values within the 90% CI band which indicates the reliability of the model. The CNN–BiLSTM model at 15-day lead outperformed other stand-alone and hybrid CNN–LSTM model. The uncertainty bands of the CNN–BiLSTM show the observed water demand within the CI bands with some underprediction during the peak water demand in year 2020. The number of data points within the bound of the CI varies from 88 to 90% even up to 15-day lead forecasting.

Comparison of the Present Study with Previous Investigations

Comparing our study to previous investigations in the field of urban water demand prediction reveals several novel contributions and advancements [14, 59,60,61]. In contrast to earlier studies that predominantly relied on traditional statistical models or simpler machine learning algorithms, our research employed advanced deep learning models, specifically the CNN–LSTM and CNN–BiLSTM architectures [12]. This choice allowed us to capture complex spatial and temporal dependencies in the water demand data. Moreover, our study showcased superior performance metrics, including higher r, lower MAE, and improved NSE, particularly for shorter lead times [17, 33, 34]. In addition, other metrics such as SI, MBE, and DR were employed to precisely evaluate the prediction of water demand [54, 58]. These results can be attributed to the utilization of extensive and high-resolution datasets, which include additional relevant variables such as precipitation, temperature, atmospheric pressure, dewpoints, and humidity. Furthermore, these outcomes demonstrate the effectiveness of leveraging the strengths of deep learning techniques [12]. By surpassing previous investigations, our study contributes to the advancement of urban water demand prediction and highlights the potential of deep learning models in achieving accurate and reliable forecasts. Nonetheless, it is important to consider the generalizability of our findings to other regions and time periods.

Conclusions

Urban water demand forecasting is an essential component in the effective management of available water in an urban city. It can help the water managers in decision-making about water demand and supply in a city. In this study, we attempted to forecast one day in advance future water intake in an urban region of the Canadian city, London. We developed a novel hybrid CNN–BiLSTM model for water demand forecasting. Further, the efficacy of the hybrid models is compared with other DL models, viz., LSTM, BiLSTM, and 1D-CNN models. A detailed experimental comparison of these forecasting procedures stepwise at different leads has been investigated using our case study data. Finally, the outcomes indicate that the hybrid CNN–BiLSTM model produces the most accurate forecast, closely followed by the CNN–LSTM, BiLSTM, LSTM, and CNN models. The multi-step forecasting performance of the proposed hybrid CNN–BiLSTM model is due to the powerful feature extraction capability by 1D-CNN model along with Bidirectional LSTM approach that allows for better learning by training two LSTM instead of one LSTM on the input sequence. The model uncertainty aids to the understanding of the model for better acceptability of the point forecasts. Overall, the results support the CNN–BiLSTM approach as a promising deep learning technique for accurate lead-time forecasting of urban water demand.