1 Introduction

Demand for water is expected to increase during the next few years as the world population is also predicted to grow from 6.9 billion in 2009 to 8.3 billion in 2030 and 9.1 billion in 2050 (UNDESA 2009). This population growth and the increasing urbanization trend will lead to a higher water demand and, at the same time, compromise the ability of ecosystems to provide conventional and cleaner supplies (The World Bank 2012). Although there are different types of strategies to deal with imbalances between water supply and demand, the use of demand-side policies has emerged as a preferred option during the last decades. Among these, pricing policies have become a particularly attractive option, since they may result in lower levels of efficiency losses than other rationing alternatives (Roibás et al. 2007). In this sense, accurate estimates of price elasticity of water demand are crucial for policy decision-making, since they make it possible for water policy designers to understand how strongly water consumption will react to changes in price. However, due to the characteristics of the good, pricing policies are often constrained by regulation, due to the characteristics of the good (Olmstead and Stavins 2007), so water suppliers also use non-price conservation programs to induce water reductions.

In the past, the neoclassical approach assumed that “tastes neither change capriciously nor differ importantly between people” and it was limited “to search[ing] for differences in prices and incomes to explain any differences or changes in behavior” (Stigler and Becker 1977, p. 76). However, there may be great heterogeneity in water consumption, even amongst individuals who are similar in observable characteristics. Therefore, when analyzing the estimation of residential water demand using microdata, addressing unobserved heterogeneity is a critical issue, since the demand functions are influenced by unobservable heterogeneous preferences.

During the last few years, the problem of unobserved heterogeneity has received special attention, resulting in two main ways of addressing it. One approach confines unobserved heterogeneity in an individual-specific effect, as in linear panel data models such as fixed-effects and some random-effects models, while assuming that the marginal response to the demand determinants is the same across individuals. A more flexible approach, based, for example, on the use of random coefficients models, assumes, instead, that the regression parameters vary randomly across individuals according to some distribution and identifies the mean and the standard deviation for these parameters (Cameron and Trivedi 2005, pp. 9–10). Another common strategy in the water demand literature has been to group individuals a priori according to observable characteristics that are assumed to be proxies for unobserved preferences and tastes.

The approach adopted here differs from the previous literature in that we use a Latent Class Model (LCM) to control for heterogeneity in preferences. It allows us to identify a finite number of consumer “classes” and hence different water demand functions. This methodology consists of estimating the model in two simultaneous steps. One involves estimating the main function, in this case water demand, and the other step estimates the probability that each consumer to belong to a given class. By sorting individuals based on the similarity of their unobserved component, the LCM accommodates unobserved heterogeneity, while tractability and theoretical consistency are preserved in terms of the so-called Ockham’s razor. That is, water use would be perfectly explained with a model in which each consumer had a different water demand function. However, that model would be practically intractable and useless for predicting water consumption. Instead, the LCM groups consumers into the minimum number of classes that is consistent with common preferences.

The LCM has several advantages over the techniques discussed above. Compared to the first approach (using linear panel data models, such as fixed-effects and random-effects) it also accounts for slope heterogeneity across different groups of consumers, instead of confining unobserved heterogeneity to an individual-specific effect and constraining all consumers to have the same marginal effects. As stated above, the second approach (the random coefficient model) assumes that the coefficients are different for each consumer. On the other hand, the LCM identifies consumer profiles that may be more easily managed when it comes to effecting water conservation policies. The LCM does not require making an ad hoc selection to the membership, which could be highly sensitive to arbitrariness, since it segments consumers endogenously into different groups. Moreover, the LCM identifies classes and allows to flexible modelling of the probability of belonging to a certain group (within which unobservable preferences are similar) as a function of a set of, in principle, observable covariates. Therefore, it provides information about the size of each group and a description of the type of consumer belonging to them. This information can be very useful for the design of water management policies, as long as information about the factors that affect class membership can be obtained at a reasonable cost.Footnote 1

Our application exploits a panel dataset from Granada (Spain) that contains information on water consumption and prices for the period 2009–2011, as well as on socioeconomic variables and self-reported water conservation habits from a household survey carried out in 2011, which can be useful to control for individual heterogeneity. This data set is of particular interest for two reasons. First, Spain is the most semi-arid country in the European Union (Lopez-Gunn et al. 2012) and the South of Spain, where the city of Granada is located, is regularly affected by droughts and other water availability issues. Thus, it is important to understand residential water demand in order to improve water management. Second, there was a change in the price structure in the city of Granada in 2011, which makes it possible for us to consider not only changes in price levels but also changes in the size and number of price blocks when analyzing consumer responses to the water tariff. Our findings provide potentially useful information for regulators by identifying four different residential water consumer profiles. We also derive some rather informative conclusions from the analysis of the change in the price structure effected in 2011. Additionally, a sensitivity analysis is also carried out to compare the results obtained using the LCM with those obtained using an alternative grouping technique. This analysis illustrates the superiority of LCMs for identifying homogeneous groups of consumers.

The paper has the following structure. In Sect. 2, we discuss different methods that previous literature has applied to deal with heterogeneity issues. Section 3 presents the econometric model. Section 4 describes the tariff structure in the city of Granada, paying special attention to the change in the structure in 2011. Section 5 describes the data. Estimates from the LCM and sensitivity analysis are presented in Sect. 6. Finally, Sect. 7 concludes summarizing the main results.

2 Background

Understanding residential water demand is essential to the effective management of water resources. Consequently, the literature on residential water demand is vast, as revealed by the many studies that have surveyed the estimation of water demand. For example, Arbués et al. (2003) focus on different modeling approaches and data sets; Dalhuisen et al. (2003) include a meta-analysis of price and income elasticities; Worthington and Hoffman (2008) provide a survey of model specification and results; and Nauges and Whittington (2010) review the literature analyzing household residential demand in developing countries.

As mentioned in Sect. 1, it is important to account for heterogeneity, particularly when analyzing the effect of a change in the price structure. Differences in terms of price elasticities may be due to the underlying heterogeneity among regions and even households. Thus, an increasing number of studies aim to control for the presence of unobserved individual heterogeneity in residential water demand. However, the common methods to address heterogeneity seem to perform relatively poorly under certain circumstances. We will discuss the problems associated to each technique, providing additional arguments to support the use of Latent Class Analysis.

A frequently adopted approach is to control for unobserved household heterogeneity through the inclusion of household fixed effects. For example, Pint (1999) uses a fixed-effects model and an ordinary least squares (OLS) model to estimate household responses to water price structure changes in California, finding the fixed-effects model to be preferable to the OLS model. However, none of these estimations considered instrumental variable (IV) specifications, resulting in upward-sloping water demand at high prices. Worthington et al. (2009) analyze residential water demand in several councils in Queensland by estimating common-effects (whereby they assumed that water consumption was homogeneous across local councils), fixed-effects, and random-effects models. Their results show that the fixed-effects model outperforms the others for that particular case. Coleman (2009) develops dynamic models of water demand in Salt Lake City estimated using fixed-effects models and compares them with static models obtained using pooled, fixed-effects and random-effects models. Polebitski and Palmer (2010) estimate pooled, fixed-effects and random-effects models to analyze single-family residential water demand for over 100 census tracts for the period 1991–2005, and the Hausman test indicates that the fixed-effects model is preferred over the random-effects model. Nataraj and Hanemann (2011) include household and year fixed-effects into a regression discontinuity model to account for heterogeneity across the treatment and control households in a natural experiment to determine whether consumers react to an increase in marginal price.

As previously discussed, another way to handle heterogeneity is through the estimation of Random-Coefficient Models (RCM). This methodology has not been widely used in the water demand literature, llikely due to its difficult interpretation as a tool to identify groups of individuals with relatively similar responses to changes in dependent variables. This is because the RCM assumes a continuous distribution of random individual-specific regression parameters and only identifies the mean and the standard deviation of each of these distributions. This limits its usefulness in a case like ours. As far as we are aware, Miyawaki et al. (2010) is the only study that applies this methodology in this field. They conduct an analysis of Japanese residential water demand estimating a random parameters model and an autoregressive of order one error component model, obtaining similar results from both estimations.

An alternative approach consists of including dummy variables to indicate socioeconomic and demographic characteristics that can capture differences in individual’s preferences, in the demand function. Renwick and Green (2000) incorporated irrigation dummy variables into the demand equation to account for differences in outdoor water use. Krause (2003) investigated consumer heterogeneity in water demand using a set of experiments and a survey. First, they included group dummy variables interacted with the parameters in the demand function and then computed disaggregated demand functions for three consumer types considered in the experiment and surveys: students participants, workforce participants and retired participants. Therefore, the ability of this technique to control for heterogeneity using this technique is clearly limited.

Some studies identify different groups of consumers according to observable characteristics that may be related to the consumers’ unobserved preferences. Renwick and Archibald (1998) analyze the effect of demand side policies by clustering groups of consumers in terms of income. Ruijs et al. (2008) estimate a linear demand function in the Metropolitan Region of Sao Paulo for the period 1997–2002 and evaluate welfare and distribution effects for five income groups. Mansur and Olmstead (2012) divide the sample into four sub-groups based on income and lot size in order to compare different price elasticities for indoor and outdoor water demand. However, these techniques make an ad hoc selection to the membership, which is highly sensitive to arbitrariness.

LCMs have attracted increased attention lately since, as we will describe in Sect. 3, this technique presents significant advantages. Among these advantages, we exploit the fact that it makes it possible to generate homogeneous groups of consumers without setting any a priori criteria. A number of studies use this methodology to analyze demand in other economic fields, such as health economics (Deb and Trivedi 2002; d’ Uva 2006; Ayyagari et al. 2013; Hyppolite and Trivedi 2012), cultural economics (Boter et al. 2005; Fernandez-Blanco et al. 2009; Grisolía and Willis 2012) or transport (Hensher and Greene 2003; Shen et al. 2006; Shen 2010; Hess et al. 2011; Greene and Hensher 2013).

There are several applications of the LCM in environmental economics. For example, Scarpa et al. (2005) compare the use of the mixed logit random parameter model with the use of Latent Class Analysis to model the choice of water utility by the consumer. Patunru et al. (2007) implement this methodology to investigate the willingness-to-pay for the clean-up of hazardous waste by homeowners in Waukegan, Illinois. Scarpa et al. (2007) study different groups in the demand for hiking in the eastern Italian Alps, arguing that it is fundamental to assess heterogeneity when analyzing expected consumers surplus, predicted visitation, and response to access fees. Campbell et al. (2011) identify heterogeneous groups of respondents that were asked about the willingness-to-pay for improvements in four rural landscapes in the Republic of Ireland. However, to our knowledge, there have been no applications as yet to residential water demand functions.

Another typical concern identified in the residential water demand literature is about price endogeneity, especially in the presence of nonlinear prices. As detailed by Olmstead (2009b), there are two types of estimation approaches that have been used in the literature to control for this problem: reduced-form approaches, such as IV, and structural approaches, such as discrete/continuous choice models (DCC). The IV approach is often undertaken in water demand analysis along with fully parametric or semiparametric methodologies as two-stage least squares (2SLS) or Generalized Method of Moments (GMM).Footnote 2

In DCC models, a consumer’s utility maximization problem is solved in two steps. First, the consumer selects the block given the price of each block and then decides the level of consumption that maximizes her utility. These models have been used by relatively few papers in this literature. Hewitt and Hanemann (1995) develop a DCC model of residential water demand using household level data from Denton (Texas) for the period 1981–1985, obtaining price elasticities in the range of \(-\)1.57 to \(-\)1.63, which are much higher than those obtained in the literature based on using IV techniques. Apart from the models described above, Pint (1999) applies DCC models obtaining relatively low price elasticities. Olmstead (2009b) compares IV and DCC estimates of water demand under increasing-block pricing using a Monte Carlo experiment finding that both models exhibit significant bias in the simulations. Strong and Smith (2010) critique DCC models, as Bockstael and McConnell (1983) stated that the Marshallian “prices as parameters” demand function does not exist with a nonlinear budget constraint and, therefore, applied welfare analysis is problematic in this case. Moreover, this model is based upon marginal prices, which assumes that consumers are aware of the price structure.Footnote 3 There is no general theory recommending how to control for endogeneity in LCMs. Nevertheless, we use a two-stage control function approach (explained in the “Appendix ”) because it performs better on nonlinear models.

3 Methodology

From a methodological viewpoint, LCMs are proposed to identify different groups of consumers. This methodology may perform well to estimate residential water demand for two main reasons. First, water demand functions are related to utility functions, which are based on consumers’ unobservable preferences and tastes that may differ across consumers. Therefore, LCMs allow us to identify groups of consumers who have similar unobservable preferences about how to change their water use in response to changes in a certain set of observable explanatory variables, since it sorts individuals based on the similarity of their conditional distributions. Second, from a statistical point of view, Fig. 1 shows that the distribution of residential water consumption is asymmetric in our sample. Therefore, this distribution is better approximated by a mixture of several normal distributions rather than a single normal (and symmetric) distribution.

Fig. 1
figure 1

Distribution of residential water demand in Granada 2009–2011

In LCMs, we assume that the sample of individuals is drawn from a population that is a finite mixture of C distinct subpopulations (Cameron and Trivedi 2005). The density of the dependent variable (residential water consumption) y, for observation i conditionally on some parameters \((\beta ,\pi )\) and on some explanatory variables x can be written as:

$$\begin{aligned} f(y_{i}|x_{i};\beta ,\pi )= \sum _{j=1}^{C} {\pi _{j}} f_{j} (y_{i}|x_{i}\beta _{j}), \quad i= 1,\ldots ,n \end{aligned}$$
(1)

where \(\pi _{j}\) is the probability of choice j of individual i (\(\sum _{j=1}^{C} {\pi _{j}}=1\) and \(\pi _{j} \ge 0, \; j=1,\ldots , C\)).

If any potential sources of heterogeneity are observed, the probability that consumer i belongs to class j can be parameterized as a function of covariates assuming that the latent variable follows a multinomial probability that yields a multinomial logit model:

$$\begin{aligned} \pi _{j}=\frac{exp(\gamma _{j}^{\prime }z_{i})}{\sum _{j=1}^{J}{exp} (\gamma _{j}^{ \prime }z_{i})},\quad j=1,\ldots ,J \end{aligned}$$
(2)

where \(\gamma _{i}\) is a vector of parameters to be estimated and \(z_{i}\) is a vector of observable characteristics and self-reported valuations that may be considered proxies for the underlying utility preferences (Fernandez-Blanco et al. 2009).

Therefore, if we consider a normal mixture, the log-likelihood is defined as the sum of C log-likelihood normal distributions weighted by the probabilities of class membership:

$$\begin{aligned} \mathfrak {L}(\beta , \gamma )= \sum _{j=1}^{C} {\ P_{ij}} \frac{1}{\sqrt{ 2\pi \sigma _{j}^{2}}} {\textit{exp}}\left( -\frac{1}{2\sigma _{j}^{2}}\left( y_{i}-x_{i} \beta _{j}^{2}\right) \right) \end{aligned}$$
(3)

One of the key issues in the application of the LCM is how to correctly determine the number of classes. Although LCMs with additional numbers of classes are considered nested models, it is not possible to identify the correct model using a likelihood ratio test (LRT), because regularity conditions are not met (Nylund et al. 2007). The usual way to proceed is to estimate models with increasing numbers of classes in a stepwise fashion and compare the results using likelihood-based information criteria, such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). However, since these criteria do not share the same properties, they may yield contradictory verdicts. Nylund et al. (2007) analyzed the performance of these information criteria using Monte Carlo simulations and found that the Bayesian Information Criterion outperforms the others in correctly identifying the optimal number of classes.

Once the model is estimated, we use the parameter estimates to compute the posterior probabilities of belonging to each latent class:

$$\begin{aligned} Pr[y_{i}\epsilon c|x_{i}; y_{i};\theta ]=\frac{\pi _{c}f_{c}(y_{i}|x_{i}; \theta _{c})}{\sum _{j=1}^{C}\pi _{j}f_{j}(y_{i}|x_{i};\theta _{j})}\quad c= 1,\ldots ,C \end{aligned}$$
(4)

4 Residential Water Tariffs in Granada

The water pricing structure in the city of Granada is based on increasing block prices (IBP). In this case, the tariffFootnote 4 also includes a fixed water service fee that must be paid regardless of the level of use and a set of increasing block prices. As shown in Table 2, the price structure in Granada remained unchanged between 2009 and 2010 but in 2011 the size of the price blocks was altered.

The fixed component of the tariff includes a water supply fee, a sewage collection fee, and a treatment fee and, in 2009 and 2010, a drought surcharge. Additionally, in 2011 a water tax collected on behalf of the Andalusian regional government was incorporated into the tariff.

The evolution of the prices in each block is shown in Table 1 (in real terms calculated using the province-level Consumer Price Index with base 2011).

Table 1 Evolution of prices 2009–2011

The tariffs in Granada were reviewed annually. Block prices were adjusted upwards from 2009 to 2010 but, as mentioned above, the price structure remained unchanged. However, in 2011 the rate schedule was also changed. Footnote 5 The rate schedule is described in Table 2.

Table 2 Evolution of the size of pricing blocks

As water becomes increasingly scarce in the South of Spain, water supply managers are using price as a water conservation tool. As stated above, Granada experienced a change in the price structure that resulted in a decrease in average water consumption but also an increase in the average total bill (Table 3).

Table 3 Evolution of the average total bill and the average quantity of water consumed

5 Data

Our dataset is an unbalanced panel of bimonthly observations corresponding to 1,465 households in the city of Granada covering the period 2009–2011. The data come from two sources. The first source of information consists of water consumption and water tariffs data on a random and representative sample of urban households in the city of Granada, provided by EMASAGRA, the company in charge of water supply and sewage collection in Granada. The second one is a 2011 survey of these households, who were questioned about socioeconomic characteristics (occupation, household size), housing characteristics (size, equipment), attitudes towards the environment, and conservation habits.

Data on water consumption and water tariffs was merged with survey data. Since the survey was carried out in 2011, we only have information related to socioeconomic characteristics from that year. However, since the variables considered in the analysis are usually time invariant in the short and mid-run, we consider them applicable to the period 2009–2011.

Moreover, the average price is the only time-varying explanatory variable in our specification. As explained in Sect. 4, the pricing structure is based on IBP and, therefore, we must consider the price endogeneity generated by the simultaneous determination of the price level and the level of consumption that determines the price block. When addressing this issue, we face the problem that both water consumption and average price change within a given year in our dataset but we cannot observe a set of exogenous instruments that also vary within each year. Therefore, in order to address the endogeneity problem, it was necessary to aggregate water consumption by year, which made it possible to use the set of marginal prices per block, which change yearly, as instruments. Furthermore, after this transformation, we also excluded from the sample those individuals who were not observed for the entire year 2011, because of the possible bias introduced by seasonality in their water consumption.

Therefore, after the data aggregation, the dependent variable in our specification is the average bimonthly household water consumption per year, in cubic meters, which was calculated by dividing total consumption per year by the number of 2-month billing periods. Regarding the price variable, there are two main issues related to price when analyzing water demand facing a nonlinear pricing scheme. First, one must face the choice between marginal and average price. In this particular case, consumers indicated that they were not well informed about the pricing scheme. Therefore, households may be more sensitive to changes in average price (AvP) than in marginal price. The second issue, as commented above, relates to the price endogeneity generated by the simultaneous determination of price and the block of consumption. In the absence of a general theory about how to handle endogenous explanatory variables in LCMs, we used a two-stage control function estimation technique (CF) (Blundell and Powell 2003; Imbens and Wooldridge 2007; Howard and Roe 2013) over two-stage least squares (2SLS), because it is more appropriate for nonlinear models and, although our model is linear in parameters (since we are estimating mixtures of normal distributions), the nonlinearities arise when estimating the posterior probabilities at each maximization stage. The 2SLS approach would fail, because it implies approximating the endogenous variable with a linear transformation thereof and then using the estimated coefficients in the second stage would be used to compute the posterior probabilities. Therefore the nonlinear step in the computations of the posterior probabilities is not invariant to the use of 2SLS. A more detailed discussion of this methodology and the instruments selected can be found in the “Appendix ” and the results of the estimation.

In order to allow for the possibility that the price elasticity differs between 2009–2010 and 2011, that is, between the period before and after the change in the price structure, we include an interaction (labelled AvP2011) between binary indicator for 2011 (Year2011) and the average price (AvP). Following Gaudin (2006), we incorporated also an indicator about how well-informed users are about their water tariff in the demand equation through its effect on price elasticity. To do so, we interacted a binary variable indicating awareness of the price structure with the average price (resulting in variable Priceinfo).

Household income was recorded as an ordered categorical variable, with households belonging to one of the following intervals (in Euros/month): [0–1,100]; [1,101–1,800]; [1,801–2,700]; [2,701–3,500]; [3,501–\(+\infty \)]. It would not be appropriate to use the interval categories as if they were values of a continuous variable. Usually, one would construct a set of five binary indicators of income level and introduce four in the model. However, because we did not seem to have enough sample variability to estimate all four corresponding parameters, we simplified our original income variable into a binary indicator (Highincome) of relatively higher income. In particular, we created a binary variable that identifies the richer households (those falling in the two highest income categories).

Additionally and based on previous literature, there were other variables included in the demand function. Household size (Members) was included, following most previous studies of residential water demand. Water conservation habits (Habits) were included using an aggregate index based on different daily behaviors,Footnote 6 as well as several variables representative of housing equipment are also considered, such as the number of electrical appliances (Electappl) and the number of efficient water-using electrical appliances (Electeff). Finally, an indicator of home ownership was included, since homeowners are expected to have more incentives than tenants to make investments in water-saving devices in the property, as shown by Grafton et al. (2011).

As stated in Sect. 3, individuals are sorted by the LCM into groups based on the similarity of their conditional distributions. However, the probability of belonging to a certain group can be further modeled as a function of covariates that can be considered proxies for the unobserved preferences related to water demand. That is, these variables allow us to identify household characteristics for the different water user profiles. In this sense, these variables belong in the class-selection function and not, or not so prominently, in the water demand functions. This is to say, they determine how the households’ quantity demanded reacts to demand drivers, in particular how prices affect the quantity demanded of water, but not so much the quantity demanded as such.

At this stage, water suppliers would benefit from the knowledge of the main factors determining class membership, in particular the membership to classes with some specific characteristic (for example, a particularly (in)elastic response to price changes or a particularly sensitive response to moral suasion campaigns), if combined with the availability of individual data about those variables.

In practice, these variables could end up being easily observable factors (e.g. household type, presence of children, living in an apartment or not, availability of an individual meter, being a year-round versus a seasonal dweller, education levels, renter status, etc.) but they might instead be variables whose values would be costly for the water supplier to gather. The practical advantages afforded by the application of the LCM to estimate water demand would be dependent on these informational requirements, apart from having the technical sophistication and computational resources to periodically estimate the LCM itself. Footnote 7 In some cases, a more ad hoc approach to identifying consumer groups might thus end up being more efficient, particularly in those cases in which the differences among classes in terms of the most relevant estimated parameters (such as price-elasticities) are minor.

Following Russell and Fielding (2010), we use Stern (2000) as a guide when categorizing the determinants of different water demand behaviors into four types of causal variables.

Attitudinal factors are one of the causes of behaviors. According to the value-belief-norm theory (VBN), “the general predisposition to act with proenvironmental intent can influence all behaviors an individual considers environmentally important” (Stern 2000, p. 416). Therefore, we include an environmental concern index (Enviro) as the general attitude towards the environment may influence the preferences and, therefore, the membership to a certain water use profile.

The second type of causal variables that we should take into account is personal capabilities, among which we included knowledge and skills that may affect the drivers of residential water demand. In the estimation we considered a binary variable related to the knowledge of the existence of an environmental campaign (Campaign). We also included the age (Age) of the head of the household, since it has been commonly considered as a determinant of environmentalism and, more specifically, water demand behavior. However, prior literature drew mixed conclusions about this effect. Kantola et al. (1983) and Scott and Willits (1994) found a negative effect on environmental behavior, whereas Gilg and Barr (2006) and Clark and Finley (2007) found that older consumers are more likely to report water conservation intentions.

Automatic processes such as habits and routines may guide behaviors. Therefore, examining the role of habits and routines is fundamental for the analysis of water demand behaviors. As people of different ages may have different habits and routines, we included variables reflecting the proportion of members over 65 years (Old65) and those under 16 years (Young16). We can expect households with a higher proportion of younger members to have a lower response to changes in the drivers of residential water demand, due to the strong need for more frequent laundering, more frequent showers and use of water-intensive outdoor activities. On the other hand, retired people may also have a lower response, since they are likely to devote more time to activities that involve water use, such as gardening, and spending more time at home. A water-conservation habits index (Habits) was also included as a covariate.

The last category of determinants of different water demand behaviors includes contextual factors, such as physical infrastructure and technical facilities, that are also closely related to human behavior. We used a categorical variable that accounts for the number of water efficient electrical appliances in a house (Electeff).

In order to control for different consumption patterns that may have been masked by the aggregation, we included as a covariate the standard deviation in water consumption (SD) within a year. This variable is not included in the demand function, because it does not affect directly the level of water consumption but only the water consumption profiles.

Table 4 shows some descriptive statistics and descriptions for the variables included in the LCM.

Table 4 Summary statistics

6 Results

In order to select the model that best fits the data, we estimated several LCMs changing the number of classes and compared likelihood-based model selection criteria, such as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), as explained in Sect. 3. The selection criteria, reported in Table 5, lead to different conclusions. The BIC suggests that the 4-class model with variable probabilities fits the distribution better but the AIC suggests that a 5-class model with variable probabilities is best. Therefore, following Nylund et al. (2007), we selected the model that minimizes the BIC, that is, the 4-class model with variable probabilities, since it provides the best fit for these data. The results confirm that the LCM outperforms the OLS model. That is, household heterogeneity is significant and that there seem to be four distinct residential water consumer profiles in Granada for the period 2009–2011, rather than the single one assumed by the conventional OLS approach, which forces all consumers to respond to the same pattern in terms of their water demand.

Table 5 Selection criteria for several models

Moreover, as explained in Sect. 3, the distribution of residential water consumption in the city of Granada is asymmetric and, as shown in Fig. 2, the 4-class model fits the data better than both the OLS and the random-effects models.

Fig. 2
figure 2

Distribution of residential water demand in Granada 2009–2011 (histogram). Distribution of the predicted value using OLS model (dashed line), random-effects model (grey line) and the 4-classes LCM with variable probabilities (black line)

Table 6 presents selected descriptive statistics for water consumption by class. On average, the first and the second class are the ones with the lowest and the highest average water consumption respectively. However, as explained in Sect. 3, it should be stressed that consumers are not sorted based on the values taken by the dependent variable (water consumption)Footnote 8 but instead according to the similarity of their conditional distributions of the error component, as shown by the minimum and maximum values. It is also worth noting that Class 2, the smallest (representing just over 6 % of the sample), has the highest estimated standard deviation in the distribution of consumption. That is, their consumption levels within this class are the most variable. Moreover, Table 7 shows that most of the explanatory variables are not significant. That is, it is not possible to identify a pattern in residential water consumption. As explained by Cameron and Trivedi (2005, p. 625), additional classes may be the result of the LCM grouping outliers. Therefore, considering that this is the smallest class and that we cannot identify the drivers of water demand, we may think that households that belong to this class are in fact outliers whose consumption patterns cannot easily be explained by the usual determinants of the quantity of water demanded.

Table 6 Water consumption statistics by class
Table 7 Estimated water demand models

Next, in Table 7, we present the results of the 4-class model with variable probabilities. We also report the results of the OLS model, i.e., a single-class model, and a random-effects model that will be used to assess the importance of household heterogeneity.

First, the estimation of the single-class model and the random-effects model suggests that the demand is price inelastic, being the price elasticity at the mean sample level \(-\)0.4368 and \(-\)0.3255 respectively as shown in Table 8. However, turning to the 4-class model, we find that for Classes 1 and 2, which include 26.19 and 6.92 % of the observations respectively, price has no significant impact on residential water demand. Therefore, the effect of a change in price would be overestimated for those consumers in Classes 1 and 2 when using the other two models. In contrast, for the remaining classes, price is significant, but price elasticities vary across classes, with the fourth class having the most elastic water demand. This heterogeneity in terms of price elasticities is masked when estimated through a single-class model and a random-effects model. As explained in Sect. 5, in order to allow for the possibility that price elasticity differs between 2009–2010 and 2011, that is, between the period before and after the change in the price structure, we included the interaction involving a dummy variable for 2011 and the lagged average price. Table 8 shows that water demand becomes more inelastic in the single-equation and random-effects estimation. In contrast, the 4-class model suggests that, those consumers in Class 3 have become significantly less responsive to price after the 2011 change in the price structure. That is, the single-class model and random-effects model identify a shift in the demand function for all consumers, while the 4-class model identifies this shift for a specific group of consumers implying that the change in the price structure was not similarly perceived by all consumers. Furthermore, knowledge of the price structure has no significant impact on price elasticity in the 4-class model, but it has a positive and significant effect at the 10 % level both in the single-class model and random-effects model. The lack of significance of this variable may be due to the fact that knowledge of the price structure, that is, that consumers know that the price structure is based on IBP, does not imply that consumers are aware of the block where they consume and the set of marginal prices per block.

Table 8 Price elasticities of demand

Under the single-class model, we find that the high-income indicator has a negative and significant impact on the amount of water consumed. In the 4-class model, this only applies for Class 3. This negative effect may be reflective of water conservation measures resulting from the investment in water-saving devices by this class of consumers. The income variable coefficients corresponding to the remaining classes are not significant. Therefore, a higher level of income is not associated with a higher demand of water for all the users in the sample, as opposed to what the OLS model would suggest.

The household size has a positive estimated effect on water consumption in all the models. For comparison purposes across models, the values of average elasticities of water demand with respect to household size from the models are presented in Table 9. Overall, the elasticities with respect to family size are quite heterogeneous among classes. However, the results show that, in every case, an increase in water use is less than proportional to an increase in the number of persons per household. This is consistent with other studies that have found economies of scale (Arbués and Villanúa 2006).

Table 9 Elasticity of water demand with respect to family size

Electappl seems to have a positive and significant effect in both the OLS and the random-effects models. That is, the higher the number of electrical- and water-appliances, the higher the level of water consumption. These coefficients are significant for the four classes in the LCM and differ across classes, which suggests that the effect of a higher number of appliances on water demand is heterogeneous.

The variable Habits does not have a significant effect on water demand in any of the models that we have estimated. Self-reported water-conservation habits may not have a significant effect on water consumption after controlling for other variables. However, the effect of this variable on water demand may also depend on the habits index used and when those habit are measured, as in Trumbo and O’Keefe (2005) that they measured self-reported behaviors related to water conservation across a two-year time frame, finding that self-reported water-conservation in 1998 had a significant effect on conservation intentions in 2000. That is, future intentions may be affected by past water-conservation habits.

The number of efficient electrical appliances has a negative and significant effect at the 10 % level in both the single-class model and the random-effects model. However, when we estimate the 4-class model, this variable has no significant effect for Classes 2 and 3, although it does have a positive and significant effect for Classes 1 and 4. The installation of water-efficient appliances should reduce water use, although several studies have also found the opposite effect (Campbell et al. 2004; Inman and Jeffrey 2006; Fielding et al. 2012). Among the possible causes of this positive effect on water demand is the possibility of a rebound effect, that is, smaller water savings than expected from the installation of water-efficient equipment due to behavioral changes that partially offset technical efficiency gains, and the fact that investments on water-efficient appliances may be related to activities that imply higher water consumption (Fielding et al. 2012). However, given the unstable behavior and high potential for biasednessFootnote 9 affecting the estimates associated with this variable and the habits index, we make no strong claims about their validity and the resulting conclusions should be viewed with caution.

The coefficient of the binary variable indicating home ownership is not significant in any of the models estimated, indicating that owner occupiers do not differ significantly in terms of their water demand depending on whether they own their home or not. This result could be due to the high proportion of home ownership in Granada (as in the rest of Spain), on the one hand, and the likelihood that many of those who do not own their homes are actually students who might make substantial proportion of their water use (for laundry, etc.) at their family home outside the city, on the other.

The lower portion of Table 7 presents the estimated coefficients of the covariates in the membership function, with Class 4 as the reference category. We can see that most of the covariates do not have statistically significant coefficients. That is, they seem to have very little predictive power about individuals’ preferences about water demand. Therefore, if we had divided the sample into explicitly defined groups based on these observable characteristics and self-reported valuations that a priori we expected to be proxies for unobserved preferences about water consumption, we would have misclassified individuals. In this case, LCMs seem to be adequate when some sources of heterogeneity remain unobserved.

However, there are some covariates that provide some information for identifying the different water demand profiles. Campaign, Electeff, Habits, Young16, Old65, Age and Std. Dev are all statistically significant. The first class has a positive and significant coefficient for Young16 and Old65. Therefore, the proportion of children under 16 and adults over 65 is expected to be higher than in Class 4. Class 1 is also characterized by younger household heads and a smaller standard deviation in water consumption within the year. This class also has a positive and significant coefficient for Electeff, suggesting that consumers in this class have a higher number of efficient electrical appliances than those in Class 4.

Class 2 is characterized by consumers who are aware of the existence of campaigns to promote water savings and who have fewer efficient appliances compared to those in the fourth class. Moreover, consumers in Class 2 have a higher standard deviation in annual water consumption. As explained above, consumers in this group lack a clear pattern of consumption that can be explained by the standard drivers of water demand, not only is the standard deviation in water consumption among consumers in this class is the highest but also the within-year variation for each consumer.

We estimated that Class 3 is characterized by households with a significantly lower score on the water habits index, a significantly higher proportion of children under 15, and a significantly lower standard deviation in annual water consumption relative to the fourth class.

Finally, Class 4, by implication, is defined by households with a lower proportion of children (compared to Classes 1 and 3) and adults over 65 (relative to Class1). However, the head of the household is older than those in the first class. Consumers in this class are less aware of the existence of water-saving campaigns than those in the second class, have fewer water-conservation habits than consumers in the third class, and have a lower number of efficient appliances than consumers in the first class, but higher that consumers in the second class. Regarding the variability (as measured by its standard deviation) of annual water consumption, households in this class have a larger dispersion than those in Class 1 and 3, but smaller than households in the second class.

6.1 Sensitivity Analysis

We compared the LCM to an alternative modelFootnote 10 in order to show that the LCM can better capture heterogeneity in the sample. As noted in the introduction, several studies have sorted consumers into different groups based on observable characteristics, such as income. In our own comparative exercise and in order to make groups comparable to those estimated using the LCM, we divided the sample into four sub-groups based on income and the standard deviation Footnote 11 and we maintained the same demand specification. As a comparison, and although the LCMs is a probabilistic model, we grouped consumers based upon their estimated modal probability. The results are shown in Table 10.

Table 10 Water demand, by Income/SD groups

In the spirit of Nguyen and Rayward-Smith (2008) and Eshghi et al. (2011), we used two measures to evaluate the performance of the different grouping methodologies: the homogeneity of the observations within each group, and the heterogeneity between groups.

To measure the level of homogeneity within groups, we computed the standard deviation of the residuals in each group and then we summed the indicator across groups and divided by the number of groups.

$$\begin{aligned} s(j)&= \sqrt{\frac{\sum _{i=1}^{N(j)}(\epsilon _{ij}-\epsilon _{j})^{2}}{N(j)-1}}\end{aligned}$$
(5)
$$\begin{aligned} S&= \sum _{j=1}^{J}\frac{s(j)}{J} \end{aligned}$$
(6)

To measure the level of heterogeneity between groups, we considered the ratio of the difference between the observed realization of the value of the dependent and the predicted value to the difference between the observed value and the value that would be predicted for that observation if it were assigned to a different group. This indicator was computed for each group, summed across groups, and then divided by the number of groups.

$$\begin{aligned}&\sum _{i=1}^{N(j)}\frac{(y-\hat{y}_{j})^{2}}{(y-\hat{y}_{k\ne j})^{2}}\end{aligned}$$
(7)
$$\begin{aligned}&H=\sum _{j=1}^{J} \frac{h(j)}{J} \end{aligned}$$
(8)

Table 11 shows the measures of homogeneity and heterogeneity calculated using the results from each methodology. These results support the conclusion that our LCM estimation provides the most homogeneous groups, while it succeeds in differentiating among these groups.

Table 11 Measures of homogeneity and heterogeneity

7 Conclusions

This study provides strong evidence of unobserved heterogeneity in residential water demand in the city of Granada for the period 2009–2011. We identified four different residential water consumer profiles in Granada for the period 2009–2011 using a LCM, rather than the common profile assumed by single equation approaches, and this estimation allowed us to observe four distinct prices responses. Moreover, our sensitivity analysis shows that the LCM technique is an appropriate method to group observations homogeneously.

Water demand is found to be perfectly inelastic for two of the classes we identified. The proportion of consumers who belong to these classes, based on the posterior probabilities, exceeds 33 %. The effect of a change in price would be overestimated for these two groups of consumers, which represent an substantial proportion of the sample, if price elasticities were estimated using a single-class model. The implementation of pricing policies would likely be less effective in reducing water consumption for these two groups of consumers in the future. Particularly, one of the groups that are relatively insensitive to price changes (which represents a fourth of the households) registers low average consumption levels. This result may be in line with those obtained estimating residential water demand functions based on a Stone–Geary utility function (Gaudin et al. 2001; Martínez-Espiñeira and Nauges 2004; Dharmaratna and Harris 2012; Garcia-Valiñas et al. 2014), which suggest the existence of a nondiscretionay amount of water that is not sensitive to price changes and an additional quantity devoted to discretionary uses that does respond to price variations.

Identifying different price elasticities allows regulators to predict more accurately the effect of different water conservation policies. Our analysis suggests that the focus of a water demand management policy could be tailored to the specific demand function of a particular group of consumers. Indeed, in order to reduce water consumption, pricing and non-pricing policies, such as education programs, water rationing, retrofit subsidies or public information campaigns can be jointly applied to the most price-responsive groups of consumers. However, non-pricing policies should be intensified in the case of the least price-responsive consumers, especially for the class that has a low level of water consumption, likely to be mainly nondiscretionary and, therefore, hard to adjust in the short-run. Promoting water-saving habits and the installation of efficient appliances could be useful to reduce both discretionary and non-discretionary residential water consumption (Garcia-Valiñas et al. 2014). We illustrate how the analysis of membership probabilities makes it possible to identify the characteristics of the users that belong to a given class, which should make it easier to tailor water conservation programs to best suit the response patterns of different user groups.

We have shown that the use of Latent Class Analysis shows a reasonable degree of potential as a tool to improve our understanding of residential water demand. This is, however, the first time that, to our knowledge, the use of Latent Class Analysis has been used in the estimation of water demand functions. It would, therefore, be interesting to replicate this type of work using similar data from other jurisdictions to find out whether and to which extent our results can be generalized further.