1 Introduction

Severe droughts facing urban populations are increasingly common events. Many water utilities facing droughts cannot raise prices due to political or regulatory constraints, leading them to embrace a variety of non-price water conservation policies. A popular non-price policy is a social comparison message (SCM), which compares a household’s water use to the water use of a peer group. A number of randomized controlled trials have established that SCMs consistently lead to conservation in the range of 3–5%, and similar approaches have been used to reduce energy consumption.Footnote 1

A consistent feature of SCMs is that they are more effective among high-use households in both water and energy (Allcott 2011; Ayres et al. 2013; Ferraro and Price 2013; Brent et al. 2015). In fact, many randomized controlled trials find no savings among users in the bottom quantiles of pre-treatment consumption (Brent et al. 2015; Torres and Carlsson 2018). One characteristic of these SCMs is that the comparison shown to high-use households is, not surprisingly, more likely to show that they use more water than their peer group. This is true even when peer groups are constructed of seemingly homogeneous households based on neighborhoods and household characteristics. We posit the “strength” of the normative message in SCMs depends on the distance between a household’s performance (e.g. water use) and the performance of the relevant comparison group. When SCMs inform these high-use households that they are badly under-performing relative to their peers, they receive “strong” normative messages. The argument that the strength of the normative message is critical in changing behavior is consistent with utility-theoretic models utility of SCMs (Taylor et al. 2018; Allcott and Kessler 2019), as well as empirical evidence from other settings such as charitable giving (Croson and Shang 2008; Shang and Croson 2009).

Combining the empirical pattern that high users are more responsive to SCMs with the design feature that high users are more likely to receive strong normative messages leads to our primary research question. What explains the pattern of treatment heterogeneity in SCMs? Is it the type of household or the content of the normative message? There are logical arguments for whether households or messages matter. The prior literature has not been able to disentangle these factors due to the high degree of correlation between pre-treatment consumption and the strength of the normative message.

The notion that the content of the social comparison a household receives matters is well developed in the literature on SCMs. Schultz et al. (2007) show that adding an injunctive norm to SCMs for energy eliminates the boomerang effect where households consuming below their peer group increase consumption. The injunctive norm, usually in the form of a smiley emoticon for low users, is now standard practice in commercial applications that use SCMs for energy and water conservation.Footnote 2 However, Allcott (2011) found that the injunctive norm combined with SCMs did not affect energy consumption in a regression discontinuity design around the thresholds for the assignment of different injunctive norms. In the literature on charitable giving Croson and Shang (2008) find that SCMs were able to increase donations for donors who previously contributed less than the comparison level but decreased donations for donors who previously contributed above the level. A complication in SCMs for water and energy is that conservation also generates private benefits in reduced bills. Allcott and Kessler (2019) show that households consuming above their peer group are more likely to expect that the SCM saves them money but also feel pressured and experience guilt from the SCMs.

The type of household might matter for several reasons. High-use households might have more margins of adjustment in water use, lower opportunity costs of conservation, or more ability to simply scale up similar actions that low-use households perform. Furthermore, SCM campaigns for water conservation are increasingly implemented subsequent to, or concurrently with, other drought management policies. The presence of these additional policies likely changes a households opportunity costs of conservation, which is one of the factors that might make a household more responsive to a SCM. Customers who have already reduced their water use in response to prior conservation policies may be less responsive to SCMs relative to those who have not. This is particularly relevant for locations suffering through multi-year droughts, where utilities persistently ask for additional conservation. In this context, prior conservation may be a stronger predictor of customer response to SCMs than pre-treatment water use, which is commonly used in targeted SCM campaigns for cost-effectiveness. As utilities continue to implement SCMs within a suite of conservation policies it is important to understand how these nudges fit within a holistic water conservation policy.Footnote 3 The effect of prior conservation on future conservation is a question of broader interest if prior conservation efforts diminish the ability of households to conserve in the future.Footnote 4

We address these questions by implementing a randomized field experiment with the Truckee Meadows Water Authority (TMWA) in Reno, Nevada. Our experiment had two important features. First, the experiment was conducted in the context of a multi-year drought in conjunction with TMWA’s broader drought management plan. In the summer of 2014, drought conditions required TMWA to make a voluntary appeal that each customer reduce consumption by 10% relative to the summer of 2013, implemented primarily through a large-scale public outreach campaign.Footnote 5 The drought persisted, and in the summer of 2015 TMWA replicated and extended the public appeal for a 10% reduction across the whole summer season. Second, we developed a new SCM that compares households’ percentage change in consumption relative to the same month in 2013 to the corresponding percentage reduction of similar neighbors. This new social comparison leverages the utility-wide public appeal for conservation and decouples the type of message a household receives from their pre-treatment water consumption. We compare this novel treatment to a separate treatment arm that received the traditional SCM in gallons. In this paper we refer to the traditional SCM in gallons as T1 and the new SCM in percentage reductions as T2. The voluntary appeal for conservation in 2014 also allows us to explore treatment heterogeneity based on households response to the 2014 appeal, which we call prior conservation. We define prior conservation as the percentage change in consumption from summer 2013 to summer 2014.

Our research design makes two primary contributions to the literature. First, we examine how the content of the normative message impacts the response to SCMs. Prior studies using conventional social comparisons in gallons (or kilowatt hours) are unable to disentangle whether high-use households are more responsive because they have lower opportunity costs of conservation or because they are more likely to receive an SCM with a strong signal. Consistent with previous studies, in our traditional SCM in gallons (T1), pre-treatment consumption is highly correlated with the strength of the normative message. High users are more likely to be informed that they use more water than their peer group. The key feature of our study design is that our new SCM in percentage terms (T2) generates the peer comparison using percentage changes in consumption rather than the level of water use. Our new peer comparison based on changes in consumption has a much lower correlation between pre-treatment water use and the strength of the normative message. Consequently, relative to the traditional SCM, low-use households in our new SCM have a higher probability of receiving a “strong message informing them that they are performing worse than their peers. In T2, prior conservation is more strongly correlated with the strength of the normative message, because households that have undertaken less conservation relative to 2013 are more likely to be informed that they are contributing less toward the 10% reduction goal than their peer group.

Second, we examine how prior conservation is related to the opportunity cost of conservation. Examining the effect of prior conservation requires both the existence of conservation policies prior to the implementation of SCMs and an identification strategy to estimate the effect of prior conservation separately from pre-treatment consumption. Our setting is ideal to address the effect of prior conservation given that the experiment took place in the second year of a multi-year drought where the utility called for a voluntary appeal in the first year. From an econometric perspective, we can identify the effect of both pre-treatment water use and prior conservation on treatment heterogeneity because they have a relatively low correlation and our randomized treatment assignment is balanced across both variables.

The role of prior conservation can best be described through an example. Consider a household that adjusted their irrigation controller in response to the 2014 appeal. They will likely do so again (or the controller only required a one-time adjustment) in response to the 2015 appeal, regardless of whether they receive a SCM. This household will have a higher opportunity cost for additional conservation beyond what they did in 2014 compared to a household who did not adjust their irrigation controller in 2014. This example shows that when general conservation policies exist, such as TMWAs utility-wide voluntary reductions, the conservation generated from a SCM may depend on households’ responses to pre-existing conservation policies. Therefore, we argue that prior conservation is related to the opportunity cost of further conservation, which may also be an important driver of heterogeneity when analyzing behavioral nudges for water conservation.

In our experiment each of the SCM treatments generates statistically significant average treatment effects (ATEs) of roughly 1.5%. We investigate treatment heterogeneity based on both pre-treatment consumption and prior conservation. The strength of the normative message is a major driver of customer response. Prior conservation generates treatment heterogeneity when it is closely correlated with the strength of the normative signal, as is the case in our SCM in percentage terms. By contrast, prior conservation has no significant effect on treatment heterogeneity for the traditional SCM in gallons. Pre-treatment water use increases the effectiveness of both SCMs (T1 and T2), but to a greater extent when it is correlated with strong normative signals (T1).

The results support two conclusions. First, since the patterns of heterogeneity align closely with the strength of the normative signal, we argue that SCMs likely need to convey that a household is under-performing relative to their peers to generate significant conservation. Since pre-treatment water use appears to increase the magnitude of treatment effects regardless of its correlation with the normative signal, we cannot rule out that opportunity costs also play a role in the response to SCMs. Second, prior conservation is a less important proxy for opportunity costs of conservation than pre-treatment water use in our setting.

One implication of our results is that the new SCM in percentage terms can be effective among low users who have not responded to earlier non-SCM conservation policies. Achieving conservation from low users contrasts with prior studies finding almost no treatment effects for low-use households, since they would be unlikely to receive a strong normative message in a gallons-based SCM. Sending strong normative messages to low users generates conservation among customer classes that were not previously receptive to SCMs, which may be particularly important when significant water curtailments are necessary during extreme droughts. A simple targeting rule based on prior conservation leads to a 38% increase in the aggregate gallons saved, showing the importance of searching for the mechanisms behind established patterns of heterogeneity.

This research is related to the work on the heterogeneous impacts of SCMs for water and energy conservation. Nemati et al. (2019) finds that low users in the bottom two quintiles of the pre-treatment do not conserve any water when treated with SCM and water data analytics in Fulsom City, California. In fact, the bottom quintile of treated households has a statistically significant increase in water use. Goette et al. (2019) shows that low users have almost no response to a water conservation treatment including an SCM that generated savings of roughly 6 liters per day. Ferraro and Price (2013) and Ferraro and Miranda (2013) show that among three different information treatments for water conservation in Cobb County, Georgia, the SCM generated the largest differential effect by pre-treatment water use. While low-use households still have statistically significant CATEs, high users save 2-3 times more water than low users. Bhanot (2018) finds similar effects for SCM messages in California; most of the conservation is concentrated in the higher pre-treatment water use deciles and most lower deciles have CATEs close to zero. Torres and Carlsson (2018) examine the spillover effects of SCMs in Columbia and find insignificant CATEs for low users and statistically significant CATEs three times larger for high users. Brick et al. (2017) conduct a variety of randomized treatments, including a SCM, for water conservation to cope with severe drought in Cape Town, South Africa. They also find that the lowest quintile of the pre-treatment are not responsive to the SCM while higher quintiles to conserve in response to treatment. Similar effects have been observed in energy SCMs; Allcott (2011) shows a nearly monotonic relationship between pre-treatment energy consumption on the effectiveness of SCMs across 17 experimental samples.

This research is also related to the work on how behavioral nudges interact with alternative existing policies. Pellerano et al. (2017) interacts social comparisons with features of electricity tariffs and finds evidence for crowding out of intrinsic motivations. West et al. (2019) examines the interaction of social comparisons with water restrictions policies, finding little evidence of crowding out. Brent et al. (2015) find that social comparisons increase the probability of signing up for alternative utility water conservation policies. Brent and Wichman (2020) find little interaction between prices and social comparisons in a large southern California utility. We contribute to this research by highlighting how social comparisons can be integrated into utility-wide voluntary appeals for conservation to produce additional conservation, including a new SCM format that draws upon the prosocial contribution to the regional drought.

We also contribute to research on the mechanisms through which behavioral nudges operate. A simple dichotomy is that SCMs can impose a “moral tax” on consumption (Levitt and List 2007) or provide privately beneficial information to assist in optimizing a household production function (Becker 1965). For example, an SCM may reduce consumption because a household feels guilty about consuming more than their peers. Alternatively, the fact that similar households use less water may prompt a household to investigate ways to reduce their bill.Footnote 6 The re-optimization mechanism is related to research on “internalities” - the failure of consumers to fully account for all the private costs of consumption (Allcott and Kessler 2019; Allcott et al. 2014; Allcott and Sunstein 2015; Allcott and Taubinsky 2015).Footnote 7 Allcott and Kessler (2019) argue that the moral mechanism of response can reduce the welfare gains from SCMs, which may be welfare-reducing for a sizable portion of households.

2 Background and Experimental Design

The study was conducted in the metropolitan area of Reno, Nevada, an arid city of approximately one-quarter million people in the western United States. Water supply to the Reno metro area is primarily provided by the Truckee Meadows Water Authority (TMWA). The Truckee River is the primary source for TMWA’s water supply, which relies on snowmelt from the Sierra Nevada as well as storage provided by Lake Tahoe and Pyramid Lake. Water demand is highly seasonal, with the peak demand period coming in the summer to meet demand for residential irrigation. In 2015, in response to an expected drought, TMWA launched a major media campaign to reduce water use during the summer irrigation season by requesting that each TMWA customer use 10% less water from May through September 2015, relative to their water use during the same months in 2013. TMWA used bill inserts as well as a wide variety of media including print, radio, TV, social media, and billboard messages to publicize the conservation message. TMWA used 2013 as the comparison year for the 10% reduction because they had also asked for conservation during the summer of 2014. Such a conservation request was uncommon in the region prior to 2014; the last time TMWA made a request was during a drought in 1992.

We employ our new SCMs in a utility-scale randomized field experiment conducted in partnership with TMWA. Single-family customers received one of five mailers, with approximately 4,300 households included in each of five treatment groups, and 21,552 in the control group. The control group did not receive any informational mailers, but both treatment and control groups were exposed to the drought messaging asking them to voluntarily reduce consumption by 10% from the media, billboards, and messages printed on monthly bills. Two of the five treatments include a SCM: one is a traditional SCM in terms of total gallons used by the household relative to a peer group; the other is our new SCM using percentage differences as described above. Both SCMs describe TMWA’s goal for each household to use 10% less water for each month of the summer of 2015 relative to the summer of 2013 to cope with the drought. In this paper, we focus primarily on the two SCMs; we refer to the traditional SCM in gallons as T1 and the new SCM in percentage reductions as T2. Figure 1 provides a timeline of water conservation policies TMWA imposed to provide context of where our experiment fits into broader utility water management policy.

Fig. 1
figure 1

Timeline of Water Conservation Policies. Note: The figure presents the timing of voluntary restrictions in the years prior to the drought as well as the timing of our experiment

2.1 Description of Treatments

This article focuses on the two SCMs (T1 and T2) out of the five employed in the overall field experiment. Since the treatments of interest contain components of the first two treatments we present a brief description of all five treatments. Table 1 summarizes the information in all treatments and the "Appendix" includes example components of the five mailers. Every letter began: “Because of the extended drought in Northern Nevada, we are asking all of our customers to reduce water use by at least 10% this summer compared to summer 2013 - the last summer before TMWA started asking for summer water use reductions.” All letters also included the statement: “Since TMWA customers use on average about four times more water in summer than in the winter, we expect that for most customers the easiest way to achieve this reduction is to adjust outdoor watering.” We reference the three treatments that are not analyzed in this paper as A1, A2, and A3. Treatment A1 provided generic tips, treatment A2 augmented the generic tips with personalized information about the customer’s water use, and treatment A3 contained all the information in A2 along with information on financial savings and the increasing block rate structure. A more detailed description of the three treatments not analyzed in this paper is available in the "Appendix".

In addition to the tips and personalized information, T1 contained a social comparison message under the header “How does your water use compare?” The core of T1 is a graphic comparing the customer’s total water use in kgal for the last billed month to the median water use of a peer group consisting of single-family residences in their neighborhood with similar yard size and number of bedrooms.Footnote 8 In essence, T1 reproduces the standard SCM used in the OPower studies on energy and the Cobb-County (Ferraro et al. 2011) and Watersmart (Brent et al. 2015) experiments in water (see "Appendix" Fig. 7).

T2 was similar to T1 except that the SCM was framed in terms of relative performance toward achieving the 10% goal and the comparison graphic was based on the percentage change in water use in the previous billed month relative the same month in 2013 ("Appendix" Fig. 8). The comparison group was identified in the same way as T1. We also included injunctive norms in the form of a message rather than emoticons or “smiley faces” as in Schultz et al. (2007). Residences that had met their 10% goal in the last billed month received the message “Keep up the good work.” Residences that did not meet their 10% goal in the previous month received the message “As a reminder TMWA is asking all customers to do their best to save at least 10% this summer. Please do your part to help with drought.” The injunctive norms for T1 and T2 were both based on whether households had met the 10% goal and were not based on the peer comparison.

Importantly, T1 and T2 contain the same information on an individual household’s performance towards their 10% goal and the same injunctive norm. The only difference in T1 and T2 mailers is that the peer comparison in T1 is based on total gallons and the T2 peer comparison is based on changes in consumption. Figures 4, 5, 6, 7 and 8 show what mailers one specific household would have received had they been assigned to each of the five treatments. Importantly, as seen in Figs. 7 and 8 our example household used less water than their peers but simultaneously had not reduced water use by as large a percentage as their peers. Therefore, this household would have received a stronger normative message being assigned to T2 than T1 without changing any household characteristics. The reverse situation could also be true: a household that used more water than its peers but reduced by a larger percentage would receive a stronger normative message if they were assigned to T1 rather than T2.

Table 1 Information included in the five treatments

2.2 Randomization

Our sample frame included 42,703 eligibleFootnote 9 single family homes, which we randomly assigned to either the control group or one of the five treatment groups. Randomization blocks were defined by billing cycles, rate schedule, and frequency of recorded meter data (i.e. monthly, daily, or hourly, though all customers only receive monthly usage totals). The "Appendix" provides more details about our randomization procedure and the process of generating the mailers. In total, 21,151 treatment households were assigned to receive mailers of which 4231 were selected for T1 and 4,217 were selected for T2. The control groups consists of 21,552 households. We also randomized whether households received one or two mailers. A total of 2839 households were assigned to receive a single mailer in July (using June consumption as the last month billed), 2819 received a single mailer in August (using July consumption as the last month billed), and 2790 received mailers in both July and August. Table 2 shows the number of observations by treatment and timing of mailers.

Table 2 Total treated households by month/treatment

Table 3 shows that each treatment is balanced relative to the control and that treatments are balanced relative to each other. Additionally, Tables 8, 9, 10, 11 and 12 in the "Appendix" show that the whole experimental sample is well balanced and that each treatment is balanced relative to each the control and each other within quartiles of pre-treatment consumption. Figure 9 graphically displays the densities of pre-treatment consumption for the pooled treatment, each individual treatment, and the control group. In addition to achieving balance on average pre-treatment consumption, Fig. 9 in the "Appendix" shows the treatments are balanced across the full distribution of pre-treatment consumption. The graphical evidence is formalized by nonparametric Kolmogorov-Smirnov tests (Table 13 in the "Appendix") that fail to reject the null of equality of distributions for pre-treatment consumption across the control and the pooled treatment as well as each treatment individually. Our sample is well balanced by design, which allows us to make valid inferences for the conditional average treatment effects within subgroups, particularly subgroups that are functions of pre-treatment consumption and prior conservation.

Attrition in our setting is primarily due to utility accounts closing for some reason, such as a household moving (we had only five households out of the full treated sample of 21,151 call in to the utility to opt out of the experiment, which we consider inconsequential). 83 of 4231 (2.0%) households assigned to T1 dropped out of the sample for administrative reasons, while 87 of 4217 (2.1%) households assigned to T2 dropped out. Tests for equalities of proportions fail to reject differences in attrition across treatment groups. Attrition in the control group was slightly higher at 2.8% and we do reject equality of proportions between both treatments and the control. We are not sure why the attrition rate was slightly higher in the control, but we are comforted by the fact that attrition is roughly the same even if there are detectable statistical differences. And in any case, we do not believe attrition is a validity threat since a households decision to move homes is very unlikely to be related to the information treatment they received in the experiment.

Table 3 Balance on observables

3 Methodology

The primary variable of interest is monthly household water consumption, obtained from TMWA billing records, expressed in average gallons per day (GPD). We calculate GPD by dividing total billing cycle usage by the number of days in that billing period to avoid problems with billing periods of different lengths. Our regression analysis uses “normalized GPD” as the main dependent variable, in which each customer’s GPD is divided by the average control group consumption across the experimental period (July–September 2015) following Allcott (2011). This allows the regression coefficients to be interpreted as the average percent change in consumption, while preserving the treatment effect of very high water users, which the logarithmic transformation of consumption would dampen. Our specification is:

$$\begin{aligned} y_{it} = \alpha + \gamma _l T_{i,l} + \beta \mathbf {x_{it}} + \epsilon _{it} \end{aligned}$$
(1)

where \(y_{it}\) is normalized GPD, \(T_{i,l}\) is an indicator variable for the pooled treatment and each of the two treatment letters (\(l={Pooled,1,2})\), and \(x_{it}\) is a vector of control variables. We restrict our sample to the post-intervention period (consumption in July–September), which comprises the billing months of August, September, and October 2015. While treatment is exogenous by virtue of the randomization, including control variables increases the precision of the estimates. All regressions therefore include average consumption during irrigation seasons prior to the intervention (pre-treatment water)Footnote 10 billing cycle and month fixed effects, and average daily temperature and days of precipitation during the billing cycle. Average pre-treatment consumption is used interchangeably with “baseline consumption” or “baseline water” throughout the paper. We matched daily weather data from the NOAA weather station at Reno-Tahoe Airport to the exact dates of each customer’s water bill to calculate the weather variables. Robust standard errors are clustered at the household level.

3.1 Identifying the Effect of Difference from the Peer Group and Prior Conservation

Figure 2 illustrates the existing correlation between baseline water use and the distance from the peer group for the traditional SCM in gallons (T1) and the percentage change SCM (T2).Footnote 11 For each treatment, we partition households into quartiles of baseline water use (displayed on the x-axis), and within each of those quartiles we further partition households into quartiles of the difference between a household’s level of consumption (or conservation rate) and that of its peer group. Since comparisons are based on median consumption (or percent conservation) within the peer group, the first two quartiles (Q1:Much better and Q2:Better) are households who are doing better than their peers, and the upper two quartiles (Q3:Worse and Q4:Much Worse) represent households who are doing worse.

Panel (a) of Fig. 2 shows that for the traditional SCM in gallons, most low water users (Q1 on the x-axis) unsurprisingly consume less water than their peer group: roughly 90% of the consumers with the lowest baseline water consumption were informed that they used less than their neighbors (Q1 + Q2 of Difference from Peer group (kgal)).Footnote 12 Likewise, most high users (Q4 on the x-axis) received a message telling them they used more water than their peers.

This is not the case for our SCM in percentage terms (T2): a substantial fraction of low users conserved less than their peers and many high users conserved more than their peers (Fig. 2, Panel (b)). A substantial proportion of households in the bottom quartile of baseline water consumption received a strong normative appeal. Likewise, some households with high baseline water use reduced consumption by a larger percentage than their peer group and received a weak normative appeal. The distribution of norms within each quartile of baseline consumption is well balanced for the conservation rate comparison treatment.

Fig. 2
figure 2

Strength of Normative Message by Quartiles of Baseline Consumption & Prior Conservation. Note: The graph displays the percentage of households receiving messages divided up by quartiles of the performance relative to the peer group within each quartile of baseline (pre-treatment) consumption and prior conservation (\(\% \Delta W\)). The x-axis displays the quartiles of baseline consumption or prior conservation and the y-axis displays the percentage of households receiving a given message. The performances relative to the norm are designated by the different colored bars. The performance relative to the peer group is defined based on quartiles of the difference between a household’s consumption (panel (a) T1) or conservation rate (panel (b) T2) and the peer group’s consumption

Panels (c) and (d) in Fig. 2 repeat this exercise based on prior conservation instead of baseline consumption. Our primary measure of prior conservation is \(\% \Delta W\), which is the percentage change in consumption during the 2014 irrigation season relative to the 2013 irrigation season. Low values of \(\% \Delta W\) (negative, and high in absolute value) represent households who significantly reduced consumption in 2014, and therefore have high prior conservation. Households with higher values of \(\% \Delta W\) either reduced consumption by smaller amounts, or increased consumption, in 2014. The cutoffs for the quartiles of \(\% \Delta W\) are as follows: Q1 is below \(-16\)%, Q2 is between \(-15\) and \(-4\)%; Q3 is between \(-3\) and 8%, and Q4 is above 8%. Therefore the change in consumption in 2014 relative to 2013 was negative for all households in Q1 and Q2, positive for some Q3 households and negative for others, and positive for all Q4 households. We will use \(\% \Delta W\) and prior conservation interchangeably. It is important to reiterate that lower values of \(\% \Delta W\) corresponds to higher levels of prior conservation.

The bottom two panels of Fig. 2 show that prior conservation is highly correlated with the normative message for the SCM based on percentage reductions (T2), and less correlated with the normative message for the traditional SCM in gallons (T1). In fact, T1 and T2 show opposite relationships when comparing the strength of the normative message to baseline water use and prior conservation. This allows us to evaluate the impact of baseline water use and prior conservation for different SCMs, where the strength of the message is targeted at different types of households.

Importantly, prior conservation and baseline water use are not highly correlated: the correlation coefficient is 0.04, and the treatment groups are well balanced across prior conservation.Footnote 13 These two features of our data allow us to examine treatment heterogeneity across both baseline water use and prior conservation.

We analyze heterogeneity based on pre-treatment water use and prior conservation by estimating conditional average treatment effects (CATEs). We focus on CATEs based on subgroups above and below the median baseline water use and prior conservation as well as quartiles of each variable. Since treatment is randomized across the distribution of pre-treatment water use and prior conservation, CATEs provide valid inference - the results can be interpreted as causal treatment effects in the same style as studies that condition on pre-intervention consumption (Allcott 2011; Ferraro and Miranda 2013; Brent et al. 2015).

The CATE model is defined as

$$\begin{aligned} y_{it} = \alpha + \sum _{c=1}^k \gamma _{l,c} T_{i,l}\times C_{i,c} + \sum _{c=1}^k \theta _{c} C_{i,c} + \beta {\mathbf {x}_{\mathbf {it}}} + \epsilon _{it} \end{aligned}$$
(2)

In this model we are concerned with \(\gamma _{l,c}\), which is the CATE for treatment letter l in subgroup c. \(T_{i,l}\) is an indicator for whether a household was treated with letter l and \(C_{i,c}\) is an indicator for whether a household falls into subgroup c of the conditioning variable \(C_{i,c}\). The presence of \(C_{i,c}\) accounts for the sample-wide differences in consumption for subgroup c. The regressions used to generate Fig. 3 define \(C_{i,c}\) as the four quartiles of pre-treatment consumption for panel (a) and prior conservation for panel (b).

4 Results

4.1 Base Results

We begin by reporting the average treatment effects pooling the two treatments of interest, and then briefly discuss each treatment individually. Column 1 of Table 4 shows that the average treatment effect (ATE) pooling both treatments is slightly greater than a 1.5% reduction in consumption.Footnote 14 Overall, our pooled ATE is smaller than commonly reported for SCM: Opower’s interventions typically reduced energy consumption by about 2%, and both Ferraro and Miranda (2013) and Brent et al. (2015) find average reductions in consumption of approximately 5%. However, these results should be considered in the context of an extensive utility-wide water conservation campaign during the second year of a severe drought. Additionally, given that the aforementioned studies on water examine some of the first interventions using SCM for water conservation, the lower treatment effects are consistent with the findings of Allcott (2015) that initial sites often have higher average treatment effects than subsequent sites.

Column (2) breaks down the treatments individually. Each treatment generated statistically significant reductions in consumption and the point estimates are all very close to each other. Columns (3)–(4) reproduce the ATE for each letter in separate regressions using the individual treatment group and the control. This simply demonstrates that both the point estimates and the standard errors are almost identical whether we use the entire sample with two treatment indicator variables or restrict the sample to one treatment and the control. Restricting the sample simplifies the presentation of the results. Column (5) presents an interaction of the pooled treatment effect with the T2 indicator. The interaction term tests for a differential effect between T1 and T2; this term is economically and statistically insignificant, indicating no differences in the ATEs for T1 and T2. All subsequent regressions also include controls for temperature, precipitation, bill cycle fixed effects, month fixed effects, and average pre-treatment consumption.

Table 4 Base regression

4.2 Treatment Heterogeneity

Next, we show that results for all of our treatments are consistent with prior research which found that treatment effects are concentrated among households with high pre-treatment water use. Panel (a) of Fig. 3 displays CATE results by quartiles of baseline consumption for T1 and T2 based on the same model in equation 2, where the conditioning variables are quartiles of baseline consumption (\({\bar{W}}\)). Each graph reflects the results of one regression of the CATEs; the shaded bars are the point estimates and the error bands are the 95% confidence intervals. There is a positive relationship between pre-treatment consumption and the estimated CATE for both T1 and T2. Households with higher baseline consumption responded more strongly to the traditional SCM treatment in gallons (T1) as well as the treatment in percentage reduction (T2), although the estimated effects are only significantly different from zero for the fourth consumption quartile for T1 and the third consumption quartile for T2. The CATE for the highest quartile of baseline consumption (Q4) is almost twice as large (3.9% vs. 2.2%) for the SCM in levels (T1) as the treatment in percentages (T2). Again, one explanation for this is that Q4 households were much more likely to receive a stronger normative message under T1 than T2 because of the correlation patterns shown in Fig. 2. Over 65% of Q4 households received a T1 message stating that they were performing much worse than their peer group, compared to only 30% of the Q4 customers who received T2 (see Fig. 2).

Fig. 3
figure 3

Conditional Average Treatment Effects by Quartiles of Pre-treatment Consumption & Prior Conservation. Note: Each of the four graphs represents the output of one regression where the dependent variable is normalized average daily water consumption. The bars are the point estimates of the CATEs for each quartile of pre-treatment consumption in panel (a) and prior conservation in panel (b). The error bars are 95% confidence intervals constructed from robust standard errors clustered at the household level. All regressions include controls for temperature, precipitation, bill cycle fixed effects, month fixed effects, and pre-treatment consumption

Panel (b) of Fig. 3 displays CATE results by quartiles of prior conservation (\(\%\Delta W\)) for each treatment based on the Eq. 2. Recall that Q1 represents the highest level of prior conservation and Q4 the lowest. Prior conservation does not appear to drive any treatment heterogeneity in the SCM in gallons; all CATEs are similar in magnitude and not significantly different from zero. Conversely, the SCM in percentage terms appears to have CATEs that are monotonically increasing with lower levels of prior conservation—similar to the pattern of CATEs for the SCM in gallons based on pre-treatment consumption. Since prior conservation is highly correlated with type of messages that households receive in T2, the pattern of heterogeneity appears linked to the normative message in the SCM.

We investigate the impact of both pre-treatment consumption and prior conservation as drivers of heterogeneity in two ways. First, we separately estimate treatment effects for four sub-samples defined according to whether they were above or below the median of pre-treatment consumption and the percentage change in consumption (\(\% \Delta W\)). Recall that households who substantially decreased consumption during the 2014 drought have low \(\% \Delta W\), which is synonymous with high prior conservation. The households in the four sub-samples are specified as follows:

  1. 1.

    Low \({\bar{\varvec{W}}}\)-Low \(\% \varvec{\Delta W}\): below median pre-treatment water use and below median \(\% \Delta W\) (low water use & high prior conservation),

  2. 2.

    Low \({\bar{\varvec{W}}}\)-High \(\% \varvec{\Delta W}\): below median pre-treatment water use and above median \(\% \Delta W\) (low water use & low prior conservation),

  3. 3.

    High \({\bar{\varvec{W}}}\)-Low \(\% \varvec{\Delta W}\): above median pre-treatment water use and below median \(\% \Delta W\) (high water use & high prior conservation),

  4. 4.

    High \({\bar{\varvec{W}}}\)-High \(\% \varvec{\Delta W}\): above median pre-treatment water use and above median \(\% \Delta W\) (high water use & low prior conservation).

These categories help delineate households opportunity costs of conservation, as well as which treatment is likely to send a strong normative message. We expect households with low baseline water use and high prior conservation (Low–Low) to have higher opportunity costs of further conservation than households with high baseline water use and low prior conservation (High–High). Therefore, we expect this latter group to have higher estimated CATEs. For the message households with high water use will likely receive a strong T1 message and households with high \(\% \Delta W\) will receive a strong T2 message.

Second, we estimate a model that interacts each treatment with indicators for above-median pre-treatment water use and above-median \(\% \Delta W\). This formalizes whether the differences observed in the sub-samples are statistically different.

To assist in the interpretation of Table 5, the coefficients for subgroups that are expected to receive a strong normative message are in bold. Column (2) of Table 5 shows that neither treatment is effective among the subgroup of households that have low use and low \(\% \Delta W\) (high prior conservation); both estimated coefficients are small and not significantly different from zero. (Column (1) shows the base ATEs presented in Column (2) of Table 4 for reference.) Among the subgroup of households with low baseline consumption and low prior conservation (high \(\% \Delta W\)), the norm in percentage terms (T2) is statistically significant and three times as effective as the traditional SCM, which is small and not significantly different from zero (column 3). Among high baseline water users with high prior conservation (low \(\% \Delta W\)) the results are reversed (column 4). The traditional SCM (T1) is statistically significant and roughly four times as effective on these households as the SCM in percentage terms, which is not significantly different from zero. Column 5 shows that both treatments are statistically significant and highly effective among the subgroup of households that are both large baseline waters users and who had low prior conservation (high \(\% \Delta W\)). Finally, the interaction model in column (6) formalizes these patterns of heterogeneity. For the SCM in gallons, only high pre-treatment water use is a significant driver of treatment heterogeneity. Conversely, the effectiveness of the SCM in percentage terms is driven primarily by high \(\% \Delta W\) (low prior conservation). For the SCM in percentage terms high pre-treatment water use has an estimated coefficient that is similar magnitude, although not statistically significant.

Table 5 Heterogeneity by baseline water use and prior conservation

One explanation for these results is that the primary source of treatment heterogeneity is the strength of the normative message. In T1, households with high pre-treatment consumption were more likely to receive messages that they were consuming more than their peers (see Fig. 2 Panel (a)), and these households responded with larger decreases in consumption. Conversely, for T2, households with high \(\% \Delta W\) (low conservation reduction in the prior drought) were more likely to receive messages that they were conserving less than their peers in the current drought (see Fig. 2 Panel (d)). These households responded to the stronger normative message with larger decreases in consumption.

Collectively this evidence confirms that the content of the normative message in SCMs is an important source of heterogeneity in consumer response. One important policy implication is that water service providers could target households with customized SCMs based on both baseline consumption and prior conservation, both of which are readily observable in billing data. While targeting has been brought up in the literature previously (Ferraro and Miranda 2013), this has primarily been viewed as a way to make such programs more cost-effective by only sending SCMs to high-use households. We investigate targeting in more detail in Sect. 4.4.

4.3 Robustness

We present several robustness tests to assess the sensitivity of our results to our definition of prior conservation in Table 6. All regressions take the form of the interaction model presented in column (6) of Table 5. Recall that our primary measure of prior conservation is based on the percentage change in consumption from 2013 to 2014, which we replicate in column (1) of Table 6 for reference. In column (2) we define \(\% \Delta W\) as the change in gallons from 2013 to 2014. This is more appropriate if the absolute change in a household’s consumption is more relevant than the percentage change. Next, in column (3) we define high \(\% \Delta W\) as an indicator equal to one if the household did not conserve at all or increased their consumption in 2014 . For reference, the mean percentage change in the 2014 irrigation season was \(-2.2\)% and 70% of households reduced their consumption. Therefore, 30% of the sample have high \(\% \Delta W\) in this definition. Lastly, in column (4) we define high \(\% \Delta W\) as a dummy that equals one for the 45% of households who did not meet the 10% goal in 2014. The results in Table 6 are quantitatively and qualitatively similar. The interaction terms are stable for both treatments and the interaction of T2 with high \(\% \Delta W\) is similar across specifications. The 10% goal metric presented in column (4) is not statistically significant, but it is of similar magnitude. Overall the results are robust to various definitions of \(\% \Delta W\).

Table 6 Robustness to prior conservation definition

In addition to the normative message of performance relative to a peer group both treatments also include an injunctive norm based on whether consumption was 10% lower relative to the same month in 2013. This discrete norm may also have an impact on the treatment effects and could be correlated with pre-treatment consumption and/or prior conservation. We test for the discrete effects for moving above a peer group or failing to meet the 10% goal in a regression discontinuity design. We find no effect of moving above the peer group in either gallons or percentage terms, nor do we find any effect of moving slightly above the 10% goal. This is consistent with the findings of Allcott (2011) for SCMs in energy. The "Appendix" describes the regression discontinuity design in more detail and presents both graphical evidence and regression discontinuity estimates based on Calonico et al. (2015).

Since we also randomized the timing and number of letters that a household receives, we also analyze differences between T1 and T2 in these experimental design features (Table 2). The results are shown in Table 7.Footnote 15 The treatment effects from social comparisons wane significantly over time with a roughly 50% decline in conservation each month from when the first letter was received. Sending a second mailer has a strong and significant effect on conservation. Column (4) tests for any differences in treatment effects between the two SCMs by interacting the pooled variables with a T2 indicator. All the interaction terms are close to zero and not statistically significant. While timing and the number of mailers matter, they appear to affect conservation similarly for both social comparisons.Footnote 16

Table 7 Number of mailers and treatment timing

4.4 Targeting

In this section we explore how our results could allow utilities to better target social comparisons to specific households. Our exercise differs from others (Ferraro and Miranda 2013; Allcott and Kessler 2019) since we consider targeting of different treatments to different households as opposed to the optimal households to receive a single treatment. We run the interaction regression reported in column (6) Table 5 on total consumption rather than normalized gallons per day. This better places our results in the context of the TMWA’s aggregate drought policies, where the total gallons saved is critical. The treatment as implemented saved roughly 0.31 kgal per person, and in aggregate the nudge saved over 7145 kgals. If TMWA would have optimally targeted the same sample by sending the traditional social comparison (T1) to households who had high prior conservation and sending the new social comparison (T2) to households with low prior conservation, the aggregate savings would have been 9865 kgals, 38% higher.Footnote 17 Statistical tests show that the differences in the ATEs between the experiment as implemented and a targeted version are statistically significant at the 10% level.Footnote 18 A key cause for the increase in effectiveness is the ability for the social comparison in percentage terms to generate savings even among households with below-median baseline water use but who had achieved relatively little conservation, a relatively large group.

An attractive feature of our targeting approach is that it is relatively simple to implement with only water billing records. We present these steps to inform water managers or researchers who want to implement a targeted SCM campaign.

  1. 1.

    Calculate prior conservation for all households. This is simply the change in consumption relative to a base period.

  2. 2.

    Calculate the median of the prior conservation.

  3. 3.

    Divide the sample into two groups:

    • Group 1: Above median of prior conservation.

    • Group 2: Below median of prior conservation.

  4. 4.

    Assign the SCM in following way:

    • Group 1 receives the SCM comparing gallons to a peer group (our T1).

    • Group 2 receives the SCM comparing the change in consumption (our T2).

When considering the external validity of the targeting results, we should acknowledge that the targeting gains depend on the correlation between average pre-treatment consumption and prior conservation. The benefits of targeting that we find are likely due to having more households receive strong normative messages. Our targeting approach will increase aggregate savings when there are two distinct groups: (1) households with high baseline consumption and low prior conservation (who receive the SCM in gallons), and (2) households with low baseline consumption but high prior conservation (who receive the SCM in percentage terms). While these two distinct groups had a significant number of households in our sample, this may not be true in other settings.Footnote 19

If utilities choose to target different treatments to specific households it is important to consider which types of households will receive strong messages given the evidence that SCMs may generate negative moral utility (Allcott and Kessler 2019). In our setting baseline water use has a much higher correlation with appraised value (our proxy for income) than changes in consumption (0.40 vs. 0.01).Footnote 20 Therefore, targeting based on prior conservation will not significantly change who receives strong normative messages across the income distribution in our setting. It is worth evaluating both the effectiveness and distributional consequences of targeting in any setting it is applied.

Another consideration for the external validity of targeting is how the heterogeneous effects of SCMs interact with alternative water conservation policies. The recent literature finds that high-use households are less responsive to prices (Baerenklau et al. 2014; Klaiber et al. 2014). So, while price increases may cause a significant decrease in consumption for low-use households, it may not be desirable due to concerns over affordability. There is less heterogeneity in responses to voluntary and mandatory watering restrictions (Wichman et al. 2016).Footnote 21 While there may be some interactions between SCM treatments and alternative policies, Brent and Wichman (2020) show little interaction between prices and SCMs. Therefore, sending a variety of messages targeted to specific households presents an attractive option for increasing the efficiency of SCMs.

5 Discussion

The use of nudges in public policy has exploded in recent years in a variety of sectors ranging from healthy eating to paying taxes on time. Nudges, and particularly social comparisons, are now heavily utilized to improve environmental outcomes including energy and water conservation. Previous research has established some consistent features of SCMs in water and energy; they typically generate small aggregate treatment effects that are concentrated among high-use households. These findings have led to policy recommendations to improve the cost-effectiveness of nudges by targeting high-use households (Ferraro and Miranda 2013). More sophisticated targeting rules also attempt to optimize welfare improvements as opposed to conservation per dollar (Allcott and Kessler 2019). However, there is still uncertainty about why high-use households are more responsive. With traditional SCMs, high-use households may have lower opportunity costs of conservation and they are also more likely to receive strong normative messages conveying that inappropriate use relative to their peers. Additionally, most of the discussions of targeting have focused on the traditional social comparison as opposed to potentially new information treatments. Without disentangling the mechanisms it is difficult to consider more sophisticated targeting rules that allow nudges to be effective among lower-use households.

We develop a new SCM where the comparison is in terms of percentage reduction in water use as opposed to absolute water use. This SCM allows more low-user households to receive messages that they are performing worse than their peers. This SCM is also related to the opportunity cost of conservation, whereby a household’s capacity to respond to new conservation policies depends on their past water conservation actions. Similar to other studies, we find high pre-treatment water use is the primary driver of treatment heterogeneity for the standard SCM in gallons, however prior conservation is the dominant form of heterogeneity for the new SCM in percentage terms. One explanation is that where high pre-treatment water use best explains who receive strong normative message (doing worse than one’s peers) for traditional SCMs, prior conservation is a better predictor of the strength of the normative message in the new SCM in percentage terms.

The findings have important policy implications for utilities using nudges to help manage drought. By targeting our two nudges to households based on pre-treatment water use and prior conservation—both of which are easily calculated with billing data—the aggregate treatment effect increases by 38%. This increase is possible because while the traditional SCMs are typically not effective among below-median water users, our new SCM is effective among the subset of this group who have not responded to prior conservation efforts.

One caveat of our research is that we are unable to fully disentangle the mechanisms that explain our results. We believe the most compelling explanation is that consumers are responding to the strength of the normative message. This explains why prior conservation affects treatment heterogeneity in the SCM in percentage terms but not the traditional SCM. However, the possibility still exists that prior conservation itself dictates the effectiveness of conservation policies. Future research should help disentangle these results by developing better metrics for the opportunity cost of conservation using high frequency metering data (Mayer 2016).