1 Introduction

Nowadays, many countries face numerous concurrent challenges in the management of, and access to, potable water. The authors in United Nations Development Programme (2013), Ferguson et al. (2013) and Hossain et al. (2018), among many others, have identified the impact of global warming and related climate change, such as an increased frequency and severity of drought and flooding as one of the most significant impacts on our aquatic environment. As a result, considerable pressure is being placed on water infrastructures. It has also been reported that global warming generates considerable uncertainties on the long-term planning projections of water demand in urban areas (Urich and Rauch (2014). These uncertainties can lead to significant problems in other related areas such as supply, operation and cost, which traditional planning methods cannot solve.

The aforementioned increasing concerns about the impact of climate change have led to the need to plan and manage water in advanced, to guarantee meeting municipal water demand to the satisfaction of the consumer (Zhang et al. 2019). This type of strategic planning, as conveyed by Cutore et al. (2008), means planning now for an uncertain future. However, since conventional models are no longer adequate to predict urban water consumption under the pressure of climate change in the future, several researchers have been investigating and improving various mathematical models to develop techniques to better estimate essential parameters and better model forecast uncertainties (Marlow et al. 2013).

The accurate water demand prediction can play an important role in optimising the design, operation and management of municipal water supply infrastructures (Pacchin et al. 2019). This can also minimise the uncertainty that results from a rapid increase in water demand due to the impact of climatic factors (Bougadis et al. 2005). Previous studies such as Gato et al. (2007), Tian et al. (2016) and Brentan et al. (2017), have established that water consumption is affected by weather variables throughout the year. In this area of research, Artificial Neural Networks (ANNs) have been developed and compared with various traditional statistical models, the results indicating that ANN techniques offer better forecasting models such as those in Sebri (2013), Behboudian et al. (2014), Mouatadid and Adamowski (2016) and Guo et al. (2018).

The need for increased reliability, capability and accuracy regarding data-driven techniques has motivated the development of hybrid models, which would integrate two or more techniques with the aim of outperforming the capability of single models. In these hybrid approaches, typically one of the techniques would be deemed as the primary one, and the others would work as pre-processing or post-processing methods (Araghinejad 2014). Recently, several hybrid techniques have been applied to predict water demand, for example Anele et al. (2017), Altunkaynak and Nigussie (2018) and Seo et al. (2018).

Although previous studies have recognised the impact of weather factors, research has yet to thoroughly and systematically investigate the effect of these factors in terms of using adequate data pre-processing to remove the impact of socio-economic factors, which are insensitive to climate change, and to apply a powerful and effective forecasting technique on a systematic basis, instead of a commonly used trial and error approach. As such, studies to date have not been able to detect to what extent climate factors have driven municipal water demands, the debate continuing about the best strategies for the management of municipal water demand, under the impact of climate change.

Previous research on the influence of climate change on municipal water demand using a recommended baseline period has not been properly conducted. These studies have suffered from inadequate sample size, the mixing of evidence for climate change impact with socioeconomic factors and several conceptual and methodological weaknesses.

Various optimisation approaches can be adopted to handle a range of issues for different application domains. The goal of the optimisation algorithm is to determine the best parameter values of the system under different conditions (Ahmed et al. 2016). Recently, the gravitational search algorithm (GSA) proposed by Rashedi et al. (2009) has been applied to tackle various optimisation issues such as unconstrained global optimisation problems (García-Ródenas et al. 2019), hydrology (Karami et al. 2019) and in the geothermal power plant optimisation (Özkaraca and Keçebaş 2019). Particle Swarm Optimisation (PSO) algorithm has been used in different fields such as sediment yield forecasting (Meshram et al. 2019), operation rule derivation of hydropower reservoir (Feng et al. 2019) and semi-supervised data clustering (Lai et al. 2019).

Following the above review, the principal objectives of this paper are:

  1. 1)

    To remove the effect of socioeconomic factors which are insensitive to weather and have a deterministic relationship with water consumption, and also to remove noise from water consumption for a long-term, monthly time series.

  2. 2)

    To provide a new reliable and efficient hybrid technique (LSA-ANN) to forecast long-term monthly municipal water demands and evaluate how it compares with hybrid (GSA-ANN and PSO-ANN) models.

  3. 3)

    To assess the long-term influence of climate change using monthly municipal water demand relative to the period 1980–2010.

To the best of our knowledge, this is the first study that tackles the aforementioned objectives to assess long-term influence of climate change using monthly municipal water demand from the baseline period 1980–2010.

2 The Study Area

One catchment area in Australia, Greater Melbourne, Victoria, was employed to evolve the water demand model. Yarra Valley Water (YVW), is one of three retail water utilities that deliver essential municipal water supplies and sewerage services to more than 1.8 million people and 50,000 businesses, in the catchment area of Yarra River, Melbourne City. YVW buy water wholesale from Melbourne Water, which is usually harvested from protected catchments in the mountains. They deliver water to different sectors including commercial, industrial and residential (indoor and outdoor uses) users. The service area managed by the company is approximately 4000 km2, covering the northern area of Melbourne and the eastern suburbs, from Wallan in the north to Warburton in the east (YVW 2017).

3 Model Data Set

This study will use monthly historical data containing information such as measured municipal water consumption (Megalitre, ML), maximum temperature (°C), minimum temperature (°C), mean temperature (°C), rainfall (mm) and solar radiation (MJ/m2) over the periods 1980–2010. These data were collected from the Yarra Valley Water Company from areas they serve in Melbourne city.

This range of climate factors have been used by several researchers (Kadiyala et al. 2015; Osman et al. 2017; Fenta Mekonnen and Disse 2018) in different areas of study, to assess the impact of climate change as they are considered robust predictors, able to simulate municipal water demands, as shown in Zubaidi et al. (2018a). Socioeconomic variables such as population, water price and household income are deterministic signals (Zhoua et al. 2000; Gato et al. 2005) and for this reason, were not included in the current analysis, as these signals are out of the scope of this study.

Melbourne City has various meteorological stations that are spread throughout the city. The Yarra Valley Company provided us with the average daily values of all the climate factors covered by its service area. The aforementioned company had obtained these data from the Australia Bureau of Meteorology, which had applied the arithmetic mean method to calculate average values of climate factors. With this technique, all climate variable values from different metrological stations are added together and then divided by the total number of stations, to get the mean value of that variable as shown in Eq. (1). This is a simple and standard technique to calculate average daily values. Each metrological station has equal weight, regardless of its location (Bhavani 2013).

$$ {p}_m=\left\{\left({p}_1+{p}_2+{p}_3+\dots +{p}_n\right)/n\right\}\kern3.25em $$
(1)

4 Methodology

The municipal water demand model proposed here allows a long-term time series demand prediction to be calculated regarding climate change. Figure 1 presents a diagrammatic representation that contains the steps required to build the water prediction model.

Fig. 1
figure 1

Flowchart showing the steps required to forecast future municipal water demand

4.1 Pre-Processing of Data

The data pre-processing approach followed in this study comprises three techniques: normalisation, cleaning and determination model input. They are detailed below.

4.1.1 Normalisation

In this study, the natural logarithm method was used to normalise the data to be more static and to remove any collinearity from the independent variables (Behboudian et al. 2014).

4.1.2 Cleaning

Data cleaning includes the identification and removal of trends and non-stationary components from a data set, as explained in Abrahart et al. (2004). A time series yt can be decomposed into trend (T), oscillatory (O), stochastic (S) and noise (Ɛ) components (trend and oscillatory considered deterministic signals) as shown in Eq. (2) (Araghinejad 2014).

$$ {y}_t={T}_t+{O}_t+{S}_t+{\upvarepsilon}_t $$
(2)

To identify outliers, the box and whisker method was used, and the outliers then treated. The SSA technique was also used to detect the stochastic signals for long-term monthly municipal water consumption and the climate variables time series (i.e. to remove the impact of socioeconomic variables and noise from the municipal water consumption data).

SSA is a robust method used to decompose the raw time series, which may exhibit nonlinear properties, and to uncover the stochastic component after the removal of noise, trend and oscillatory components, as illustrated by Khan and Poskitt (2017). The stochastic component helps to identify the impact of climate volatility on water consumption, to enhance the accuracy of the forecasting and to decrease the scale of error between measured and predicted water demand (Zubaidi et al. 2018a). The SSA method consists of two steps: analysis of the original time series into various principal components (PCs) containing trend, oscillatory and irregular components, followed by noise removal to allow the reconstruction of a new time series that has less noise (Zubaidi et al. 2018a). This approach does not require the imposition of any statistical assumptions such as normality or linearity. It has been successfully applied in different sectors including industry (Al-Bugharbee and Trendafilova 2016), mid-term water demand prediction (Zubaidi et al. 2018) and hydrology (Ouyang, and Lu, W. 2017). Further details about SSA can be found in Golyandina and Zhigljavsky (2013).

4.1.3 Determination Model Input

The choice of the explanatory variables that influence water consumption as model input data, is an important step in the development of not only an ANN forecasting model, but any good model (Maier and Dandy 2000). In this study, cross-correlation and variance inflation factor (VIF) techniques were applied to select the model input and examine for multicollinearity among them, as previously carried out by Zubaidi et al. (2018a).

To decide on the appropriate sample size needed to develop a good model, Tabachnick and Fidell (2013) propose using a sample size that is dependent on the number of predictors, as shown in Eq. (3). In this study, the sample size is 372.

$$ N\ge 104+m $$
(3)

where N = sample size and m = number of independent variables.

4.2 Artificial Neural Network Techniques for Forecasting Municipal Water Demand

This section will briefly present the techniques used in this study, including ANN, LSA as an optimisation algorithm, and the hybrid LSA-ANN technique.

4.2.1 Artificial Neural Networks (ANN)

Previous studies have demonstrated the power of ANN to produce good non-linear models for urban water demand (Toth et al. 2018). However, unlike other applications of hydrology, ANN has not been extensively used in municipal water demand modelling (Zubaidi et al. 2018b), even when it has proven to be able to deal with a large number of input and output patterns, and is capable of handling different complex nonlinear environmental problems, making it appropriate for long-term prediction modelling (Mutlu et al. 2008).

For this study, a multilayer perceptron (MLP) network was used (a feed-forward, backpropagation network), along with the Levenberg-Marquardt learning algorithm (LM). The tansigmoidal activation function was adopted in both hidden layers to cover all negative input values, while the output layer operated under a linear activation function to cover the positive values of water demand. The model was implemented using the MATLAB Neural Network Toolbox (Mathworks 2017). The data was randomly separated into three sets include training, testing and validation sets, using 70%, 15% and 15% instances for each set, respectively, as previously done in Zubaidi et al. (2018b) and Zubaidi et al. (2018a).

4.2.2 Overview of the Lightning Search Algorithm for ANN Optimisation

Optimisation in this context refers to the process of determining the best solution for issues relying on input variables after locating the fitness function as a constraint. Often, the formulation of this function is dependent on a certain application and can be expressed as minimal error / cost, or optimal design / management. LSA is a new, nature-inspired metaheuristic optimisation algorithm, based on the natural phenomenon of lightning to tackle constraint optimisation issues. The hypothesis of this algorithm is inspired by the probabilistic nature and tortuous characteristics of lightning discharged during a thunderstorm. The generalisation of the LSA algorithm is via the mechanism of step leader propagation. This algorithm allows for the involvement of fast particles, identified as projectiles, in the configuration of the binary tree structure of a step leader. Three kinds of projectiles are developed to represent transition projectiles: the 1st step leader population N; the space projectiles that attempt to be the leader, and the lead projectile representing the optimum positioned projectile found amid N number of step leaders (Mutlag et al. 2016; Shareef et al. 2015).

LSA is similar to other metaheuristic algorithms in that it needs a population to start the search (Ahmed et al. 2016). Further details about LSA algorithm, including a review of its basic concepts, can be found in Shareef et al. (2015).

4.2.3 Hybrid Lightning Search Algorithm-Based Artificial Neural Network

ANN can be employed to predict municipal water demands using climate variables as the model input (Zubaidi et al. 2018a). To do so, it is important to consider the number of neurons in the hidden layers and the learning rate coefficient as these are essential factors of an ANN architecture. These factors are responsible for mapping the relationship between the input and output variables used to develop the ANN model and to minimise error (Gharghan et al. 2016). However, the choice of neurons and learning rate are dependent on trial and error processes that may not offer an optimal solution. LSA addresses this issue, thus enhancing the performance of ANN, by estimating the best values for learning rate coefficients and the number of neurons in each hidden layer of the ANN model. It uses a root mean squared error (RMSE) based fitness function to improve the performance of the LSA-ANN by minimising the error function.

4.3 Performance Measurement Criteria

After calibrating all the model structures using the calibration/training data set, performance was assessed using several standard statistical criteria which identify the errors related to the model simulations (Adamowski 2008). These criteria offer a means of measuring estimate accuracy, this implying that estimate errors play an important role in the selection of an appropriate model and in providing insight for alterations to current models to reduce deviations in future simulations (Donkor et al. 2014). The following statistical criteria will be used in the current model’s calibration: mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE) and correlation coefficient (R). These criteria are defined in Eq.s (4) through to (7).

$$ MAE=\frac{\sum_{m=1}^N\left|{y}_o-{y}_p\right|}{N} $$
(4)
$$ MSE=\frac{\sum_{m=1}^N{\left({y}_o-{y}_p\right)}^2}{N} $$
(5)
$$ RMSE=\sqrt{\frac{\sum_{m=1}^N{\left({y}_o-{y}_p\right)}^2}{N}} $$
(6)
$$ R=\left[\frac{\sum \limits_{m=1}^N\left({y}_o-\overline{y_o}\right)\left({y}_p-\overline{y_p}\right)}{\sqrt{\sum {\left({y}_o-\overline{y_o}\right)}^2\sum {\left({y}_p-\overline{y_p}\right)}^2}}\right] $$
(7)

where yo represents observed water consumption; yp, simulated water demand; N, sample size; \( \overline{y_p} \), mean of simulated demand, and \( \overline{y_o} \), the mean of observed consumption.

The stationarity of the stochastic time series for all variables has been examined by the Augmented Dickey-Fuller (ADF) test and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test. A residual analysis will also be used to check the goodness of fit of the ANN model.

5 Results and Discussion

5.1 Model Inputs

This section corresponds to step A in Fig. 1. Five monthly climate factors have been used to assess the impact of climate change on monthly water consumption. These factors are maximum temperature (Tmax), minimum temperature (Tmin), mean temperature (Tmean), solar radiation (Radi) and rainfall (Rain). Following data pre-processing, which included normalisation by natural logarithm and cleaning data outliers, a pre-treatment signal analysis (SSA) was used to uncover the stochastic component. Components of the original time series were examined to detect the stochastic signal. It represents the third signal in water consumption and all the climate factors time series, except the solar radiation time series, which was the second signal. The stationarity of the stochastic signals has been examined using ADF and KPSS tests. Figure 2 presents the original time series and the first four components of water consumption and all the climate factors Fig. 3.

Fig. 2
figure 2

Original signal and the 1st four components obtained by SSA

Fig. 3
figure 3

Eigenvalues of water consumption time series

To detect noise components, Ghodsi et al. (2009) pointed out that a significant drop in eigenvalue spectra values could be assumed as the beginning of pure noise. Figure 4a shows the graph of the eigenvalue spectra for the water consumption time series, where it can be seen that the first signal, which represents a trend, was prevailing and covered all the details. Therefore, the first signal was removed, and the graph redrawn in section b. In this section, a significant drop occurred in the third signal, this representing the beginning of the noise floor.

Fig. 4
figure 4

Correlations between water consumption and climate factors

A variance inflation factor (VIF) was used to examine the multicollinearity between the model input variables. Three independent factors, Tmax, Radi and Rain, were selected as the model input. The sample size required for the model was estimated by using Eq. (3), which revealed that 107 (104 + 3) were needed. In this study, the number of cases is N = 372, which is more than three times the minimum required.

A Pearson product-moment correlation coefficient was used to determine the relationship between the stochastic components of water consumption and the chosen climate variables. Figure 4 shows the correlation between the independent and dependent variables. A strong correlation was found between the stochastic signals of long-term water consumption and maximum temperature R = 0.94. This result reveals that the data pre-processing techniques are powerful.

From these results, we can see that water demand (dependent variable) can be expressed as a function of Tmax, Radi and Rain (independent variables).

5.2 Application of the Hybrid LSA-ANN Algorithm

This section corresponds to step B in Fig. 1. A MATLAB toolbox was used to run the LSA-ANN, GSA-ANN and PSO-ANN algorithms. In order to estimate the best number of hidden neurons and the optimum learning rate coefficient of all three techniques, five population sizes, 10, 20, 30, 40 and 50, were used. Note that these population sizes relate to the size of the swarm which is different to the sample size mentioned before. As can be seen in Fig. 5, a population size of 50 provides the best solution for all three algorithms. Closer inspection of the fitness function values for all algorithms shows that the RMSE for the LSA-ANN algorithm (after 40 iterations) is 0.0236, whereas GSA-ANN does not improve beyond an RMSE of 0.0241. The PSO-ANN algorithm only reaches its best RMSE of 0.0245 after 62 iterations. As such, the LSA-ANN algorithm outperforms GSA-ANN and PSO-ANN, as it achieves a smaller error (better performance) in a smaller number of iterations, making it a less complex model. Table 1 lists the design parameters of the ANN model based on the LSA-ANN algorithm.

Fig. 5
figure 5

Fitness function for various populations using the computational intelligence algorithms

Table 1 ANN-designed parameters

5.3 Application of Artificial Neural Networks

This section corresponds to step C in Fig. 1. After identifying the parameters for the ANN, the model was run several times to find the best neural network architecture to forecast municipal water demand. A range of statistical tests was applied to evaluate the performance of the model. Firstly, the results of the correlation analysis and residual distribution between observed and simulated municipal water, are presented in Fig. 6, the correlation coefficient for the validation stage, 0.96.

Fig. 6
figure 6

LSA-ANN algorithm performance for the validation data

Additionally, Table 2 provides three measures of the differences between the predicted and observed time series, to evaluate the model performance. It can be seen that the differences between the observed and predicted water demands are negligible (MSE = 6.3911 e−04).

Table 2 Three statistical criteria for the validation data

All these results reveal and confirm that:

  1. (1)

    Tmax, Rain and Radi are reliable predictors to use to simulate long-term municipal water demand, which were successfully used previously to simulate mid-term water demand.

  2. (2)

    Data pre-processing techniques have a significant role to play, specifically the SSA method, to uncover the stochastic signal and remove the impact of socio-economic factors and noise for long term time series. That means these data pre-processing techniques are effective to apply for the long term as well as for mid-term as shown in previous work.

  3. (3)

    The LSA-ANN algorithm is a reliable model which can be successfully used to forecast long-term municipal water demand, performing more accurately than the GSA-ANN and PSO-ANN algorithms (used in previous studies for short and mid-term), evaluated in this study.

  4. (4)

    The most important result to emerge from the results is the confirmation of the association between climate change and water demand over the long term.

This study has been one of the first attempts to thoroughly examine the influence of climate change on municipal water demand. The key strengths of this study are the use of data over an extended baseline period, 1980–2010, and the use of climate factors, extending knowledge of how climate change drives municipal water demand. Further research is however needed to determine the long-term effects of global warming on water demands.

6 Conclusion

Estimating water demand is an essential component in the planning and management of water resources as this can help to identify suitable alternatives to guarantee a balance between water demand and supply in the future. This study explored the influence of climate change on monthly, long-term, municipal water demand, using baseline period data from 1980 to 2010, applying a coupled SSA and LSA-ANN technique. One of the more significant findings to emerge from this study is the confirmation that maximum temperature, radiation and rain, are reliable predictors when forecasting long-term municipal water demand, as previously seen for mid-term. The SSA has revealed itself to be a powerful technique to uncover the stochastic components of long-term water consumption, after removing the effect of noise and socio-economic factor components that confirm the technique to work successfully in different lengths as shown before. The LSA-ANN algorithm has proven successful, and indeed more accurate than the GSA-ANN and PSO-ANN algorithms previously applied to different terms time series. The paired SSA and LSA-ANN model had the ability to predict water demand with an R of 0.96. The current findings clearly support the relevance of climate change on water consumption, which are significant to both practitioners and policy-makers. More research, however, is required to develop a deeper understanding of the relationship between climate change and municipal water demand over the long-term and at different locations.