Introduction

The residential sector consumes 22% of the energy in the USA [1]. Energy efficiency—reducing the energy input for a given unit of energy service output—is widely recognized as a key option to reduce buildings’ energy consumption along with the associated carbon and environmental pollutant emissions [2, 3]. Energy efficiency and reduced emissions are crucial elements of environmental sustainability. There are many energy efficiency programs (e.g., the Building American Program or the Weatherization Assistance Program) including federal-, state-, and utility-level incentives and standards to encourage home owners to adopt energy-efficient technologies and promote energy conservation behavior [4]. These incentives will continue to increase over the next decade [5]. Evaluating the actual and empirical impact of energy efficiency is critical to ensure that environmental sustainability, along with the economic and social goals of energy efficiency, can be achieved. However, due to limited data, inadequate evaluation methods, and ineffective communication between different academic disciplines, such empirical evaluation is still limited [6•, 7, 8].

As of 2014, 58.8 million smart metering infrastructure installations provide high-frequency energy demand data (often 15-min interval) for US electric utilities and their customers [9]. Smart meters are in 43% of the country and are quickly becoming the norm [10]. Development and deployment of smart meters has opened up a new paradigm for empirical energy efficiency evaluation [11] and underscores the need for new computational methods to evaluate the causal impact of energy efficiency on a large scale. Despite the emerging availability of such rich data, energy efficiency evaluation or measurement and verification (M&V) studies (rather than energy demand forecast or monitor and control studies) using smart meter data for large sample of buildings and advanced computational methods are still rare [8, 12].

Currently, empirical analysis of home energy efficiency improvements is based on costly data collection, limited data, or inadequate methods. Traditional statistical analysis alone such as parametric or nonparametric regressions is not enough to identify the causal impact of energy efficiency. In most cases, households self-select into energy efficiency installations (e.g., households that pay more attention to energy consumption might be more likely to install energy efficiency improvements). Overall energy consumption is determined by appliance purchase as well as usage decisions. While purchase decisions may be observable to a limited extent, usage decisions are even less. The observed changes in energy consumption after installations could be partially due to certain factors that are generally time-variant and unobservable to the statistician such as households’ environmental awareness, preferences for indoor temperature and lighting conditions, or other occupant behaviors [6•]. Collection of these unobservables is costly involving installation of sensors [13•]. Existing evaluation or M&V frameworks work best when the changes in factors that can influence energy consumption (e.g., weather and occupancy) can be controlled in the models [14•]. However, for the purpose of the studies which model reality when there are changes in other factors such as occupant behaviors, technologies, and environmental attitudes that are hard to control for, existing M&V frameworks are no longer appropriate. The recent deployment of smart meter and the availability of high-frequency electricity demand data at customer level could transform the current empirical energy efficiency evaluation analysis into an accurate, generalizable, and scalable process.

Gap Between Engineering Simulation and Reality of Home Energy Efficiency Outcomes

Engineering simulation models (e.g., National Energy Audit Tool (NEAT), eQUEST, TREAT, Home Energy Saver Professional (HESPro), and RealHomeAnalyzer) can predict energy savings or technical potential of energy efficiency measures; these predictions are useful for energy audits and for prioritizing energy efficiency investments. However, evidence indicates that realized energy savings are consistently lower than those predicted by engineering simulation modeling. Estimates of the ratio of realized savings to predicted savings range from 30 to 79% [6•, 13•, 14•, 15]. Researchers also debate whether energy efficiency actually saves energy at all (e.g., [16•]).

This deviation between realized and predicted savings can be explained by technology instability, occupant behavior problems, and modeling errors [4, 17]. Technology instability arises when there are quality issues with the technologies or installation process [13•]. One of the behavioral factors is the rebound effect: energy efficiency reduces the marginal cost of using certain energy service; as a result, consumers use more of this energy service [18]. Kissock and Eger [7] discuss the limitations for engineering simulation models to predict energy savings, including the functions of the assumptions (e.g., the assumed magnitude of change in usage frequency after the installation of energy efficiency improvement [19]) and simplifications used to create workable engineering models. For example, one problem with engineering modeling could be over-statement of baseline energy use [19], which suggests that engineering auditing tools might under-estimate the efficiency properties of baseline homes prior to energy efficiency improvements—thus, over-estimating energy savings [6•].

Department of Energy [20] reviews home energy engineering simulation tools and emphasizes the importance of comparing with realized energy savings. Statistically reliable empirical evidence is needed that compares realized savings with predicted savings [6•]. There is a lack of studies that comprehensively and systematically examine the factors that cause the deviations between the realized and predicted savings. Deviations could be due, at least in part, to a lack of communication among engineers, social scientists, and statisticians. Social scientists can contribute to incorporating behavioral and economic factors while statisticians can help establish a valid baseline or counterfactual energy consumption.

Existing Empirical Energy Efficiency Evaluation: Measurement and Verification

There are several evolving M&V guidelines (e.g., US Department of Energy Federal Energy Management Program M&V guidelines (FEMPMV) [21]; International Performance Measurement and Verification Protocol [22]; and American Society of Heating, Refrigerating and Air-Conditioning Engineers Guideline 14 Measurement of Energy and Demand Savings [23]). In general, there is no direct way to calculate savings of energy efficiency measures because there are no instruments to measure the energy performance of a building in the same post-installation period (after energy efficiency improvements are installed) as if there were no such installations. FEMPMV calculates energy savings from an energy efficiency improvement as

Savings = (Baseline Energy − Post Installation Energy) ± Adjustments .

The baseline energy is an approximation of the counterfactual energy consumption without the energy efficiency improvements in the post-installation period and is calculated via engineering simulation, regression models, or both. Adjustments account for changes in factors such as weather, occupancy, and other technologies between the pre- and post-installation periods.

Increasingly, researchers use large-sample utility billing data to evaluate the realized savings from energy efficiency improvements. Results suggest that the savings range from 8 to 21% [6•, 13•, 14•, 15, 24, 25•, 26]. Table 1 summarizes the key findings of these residential empirical energy efficiency evaluation studies, including the types of energy efficiency measures examined, energy savings estimated, and their methods. Energy efficiency decision makers such as policy makers, investors, and building owners do not always have the adequate data they need for their region; therefore, there is a need for more studies that use large-sample of buildings in a greater variety of regions [6•].

Table 1 Summary of recent large-sample empirical residential energy efficiency evaluation studies (nonhigh-frequency smart meter data analysis)

The most difficult component of the empirical evaluation is constructing the baseline energy consumption as shown in Fig. 1 [21]. Due to the difficulty and high cost of implementing a randomized control trial experiment or installing sensors, most studies use pre-installation energy data with or without modifications from engineering models or control groups as the baseline for energy use. Such approaches require assumptions of how treated households behave post-installation, but these assumptions may not accurately reflect reality due to self-selection bias and omitted variable bias. Omitted variable bias is the biggest threat to a valid empirical evaluation. Variables such as occupancy, occupant behaviors, appliance stocks, electricity pricing, on-site renewable energy technologies, and in-home charging of electric vehicles can influence both the households’ decisions to install energy efficiency improvements and energy consumption [28]. These variables are generally not addressed in large-sample studies, potentially causing bias in the estimation. Thus, more justifiable methods are needed to construct baseline energy use that better model the reality [27].

Fig. 1
figure 1

Difficulty in baseline energy consumption construction

Except for Metoyer and Dzvova [8], Boomhower and Davis [12], and Novan et al. [29], most residential energy efficiency evaluations look at monthly or average daily energy consumption—ignoring the intra-day timing of energy savings. Timing is important here in three aspects. First, utility load curve is not flat (as illustrated in Fig. 2); service providers have incentives to flatten the load curve in order to delay expensive capital investment as well as to increase grid stability [30]. Energy savings during load peak hours potentially add to the benefit from a load management perspective. Second, for households that are on time-of-use electricity rates (i.e., electricity prices are higher during load peak hours), timing of savings means different economic incentives and outcomes. Third, because the mix of fuel to generate electricity changes with time of day and season, savings at different times will have different environmental impacts. As such, more dynamic studies on timing of savings are needed [12].

Fig. 2
figure 2

California electrical load by hour, 2 July 2016. Source—California Independent System Operator Corporation

Studies mostly analyze the heterogeneity of the impacts of energy efficiency in a traditional way, such as separately analyzing different groups of households or, in regression models, adding interaction terms between building attributes and the energy efficiency variable [13•, 24]. Nonlinear or more flexible functional forms need to be applied when modeling the heterogeneity [29]. In addition, it is difficult to tease out the impact of individual retrofits because different retrofits and appliances interact with each other, e.g., lighting with cooling, HVAC with insulation [14•, 31].

High-frequency Data in Energy Demand Analysis

High-frequency smart meter data are valuable in revealing household energy consumption behaviors, analyzing the heterogeneity of the technology impact, and generating precise prediction of energy consumption [32]. Machine learning techniques have been shown to predict energy consumption more reliably than traditional regression or simulation models [33], especially given their ability to model nonlinear and interacted patterns [34,35,36]. With increasing access to smart meter and sensor data, many researchers have started using such data to model electricity consumption. However, the majority of these studies are focused on the total electrical load of a region [37,38,39] or commercial buildings [40,41,42,43,44]. For the residential sector, studies that use high-frequency energy demand data and advanced data analytics are mostly based on data from one [45, 46] or several buildings [47, 48], making it difficult to generalize the results. Kavousian et al. [49] use large-sample high-frequency data to analyze energy consumption; that study does not, however, focus on energy efficiency evaluation and does not have valid methods to construct baseline energy consumption due to lack of pre-installation periods.

Most building-level energy studies using high-frequency data focus on forecasting energy demand or monitor and control purpose instead of evaluating energy efficiency; estimated energy savings using monthly billing data can differ from high-frequency data estimates [8]. Therefore, there is a need for comparison analyses as well as explanations for the differences. There is a lack of large-sample building-level machine learning studies, with the exception of Burlig et al. [50], using high-frequency data that are generalizable, representative, and can potentially provide more insights improving forecasting accuracy.

Future Directions

In this section, we discuss the roadmap of using big data, machine learning, smart meters, and high-frequency energy consumption data to improve the accuracy of energy efficiency M&V. Fig. 3 is a visual summary.

Fig. 3
figure 3

Roadmap of future directions for energy efficiency M&V

Statistically sound empirical energy efficiency causal impact evaluation studies should be conducted, which can provide valuable new evidence of the empirical savings of energy efficiency improvements. Empirical savings from these causal impact studies should then be compared with savings predicted by engineering models. If there exists any gap, systematic examination of the relevant technical, behavioral, and economic factors would be needed in order to help reduce the uncertainty of future energy audits and improve prioritization of energy efficiency investments.

With the availability of big data related to energy efficiency and energy consumption, such empirical energy efficiency causal impact studies should be of large samples, producing results that are statistically sound, representative, and generalizable. Researchers should advance most existing high-frequency residential energy studies by examining a completely different scale of sample—from a few buildings in most studies to the thousands or even millions of residential electric customers. This scale of data provides greater variation in building characteristics, appliance stocks, and demographics; thus, the results will be more precise and representative, and impact heterogeneity can be analyzed. To handle this scale of analysis, a new evaluation framework is necessary.

As discussed earlier, big data allows us to overcome the limitations caused by the absence of true randomized control trials by estimating more precise counterfactual energy consumption. To address baseline consumption and variable omission, novel evaluation framework that uses rigorous and advanced statistical analysis should be developed. For example, a combination of matching (to select a valid control group) and flexible fixed effects panel regressions [12, 13•, 14•] can be used to construct valid baseline energy consumption and to address the missing variable issues. Researchers can use both the energy consumption of control customers matched on large set of attributes and the pre-installation period consumption of the treatment customers, while controlling for other time-variant conditions (e.g., weather and occupancy) and using flexible fixed effects to control for unobservable factors in the panel regressions. Since the ground truth of the counterfactual is not observed for any household [51], cross-validation and out-of-sample prediction are not very relevant in casual inference here for panel regressions.

The high-frequency electricity demand data allow us to estimate energy savings at hourly or even 15-min intervals. The intra-day timing of energy savings allows us to more precisely evaluate the impact on grid operation, economic incentives for utilities and residential customers, and environmental emissions. Researchers should utilize the high-frequency data to evaluate energy savings at different time of day and seasons in future studies.

Nonparametric and machine learning techniques such as random forest [52, 53] can be applied to rich high-frequency data to uncover the nonlinear impacts of various factors on the efficacy of energy efficiency improvements. Such fine level analysis of heterogeneity is critical for customized energy efficiency solutions for individual buildings. Combined with other technology information such as appliance saturation survey or energy efficiency installation dates, the impact of certain improvements or bundles of improvements can be isolated. Being able to disentangle the effects of individual or combinations of energy efficiency improvements is important for decision makers to prioritize energy efficiency investments.

Rather than focusing on forecasting energy demand, evaluation studies should examine the causal impact of energy efficiency. Causal inference concerns how much energy savings can be allocated to an energy efficiency improvement instead of how much savings are associated with it. For example, an environmental savvy household may be likely to install energy efficiency improvements and may save energy after the improvements, but their savings could also be partially due to their environmental attitudes. This phenomenon is called self-selection bias. If the statistician is not able to observe the environmental savvy variable, s/he might allocate all of the savings to the energy efficiency improvements. Causal inference is critical for decision makers to understand in order to pass judgment on the cost-effectiveness of the energy efficiency improvements. Machine learning techniques have mostly focused purely on improving forecasting and prediction, and recent development has not contributed significantly to causal inference [68]. Selection bias and omitted variable bias are two key challenges for consistently assessing the causal impact of energy efficiency improvements. A combination of quasi-experimental design or randomized control trial [6•], flexible fixed effects panel regression, and machine learning techniques [50] can be used to develop an energy efficiency evaluation framework for causal impact. Balandat [54•] uses machine learning methods for causal inference on time series high-frequency electricity demand data to evaluate the impact of a large-scale demand-side management program in California.

Table 2 is a summary of the pros and cons of existing energy efficiency evaluation methods and how big data techniques enabled by high-frequency demand data can help improve such methods. There are three working papers that serve as good examples of applying new techniques discussed in the roadmap as shown in Fig. 3. The first study is Novan et al. [29], which finds that the building codes in California adopted in 1978 help residential buildings save 13% of electricity for cooling. They use individual building-level high-frequency data to estimate building-specific temperature response function, which is then used to construct the counterfactuals of energy consumption in post-code period if the codes were not adopted. The second study is Boomhower and Davis [12], which applies a rich set of fixed effects in panel regression models to help eliminate unobservable confounding factors. They estimate the energy savings of energy-efficient air-conditioners by hour of day and assess the economic value of such energy savings. The third study is Burlig et al. [50], which uses high-frequency data and the machine learning technique LASSO, a form of regularized regression, to construct counterfactual electricity consumption. They evaluate the energy efficiency from school buildings and find savings of 2–5%.

Table 2 Conventional M&V approaches versus new techniques

Conclusion

New energy big data opportunities can potentially transform existing energy efficiency evaluation studies into more statistically sound and generalizable ones which can provide richer information about the impacts of energy efficiency measures [55]. Exploiting big data effectively will require interdisciplinary research that links social science, information science, along with energy systems science and engineering [56]. As the Internet becomes more pervasive (i.e., with Internet of Things), potential sources of data increase dramatically—beyond just smart meters. Virtually all key appliances and energy conversion units (such as heating, air-conditioning, and ventilation) will become “smart” and generate large sets of data. These new types of big data can help researchers analyze occupant behaviors in a more efficient way. This will create new opportunities, first for big data analytics and then for energy management intervention—including customer feedback and engagement [57].

New energy efficiency evaluation studies enabled by big data could facilitate the adoption of energy-efficient technologies that are crucial for tackling climate change, reducing energy consumption, and ensuring environmental sustainability. Perceived risk and lack of information are the two biggest reasons for the slow diffusion of these technologies [58] [4]. Empirical energy efficiency evaluation studies can help resolve the perceived uncertainty associated with the actual energy savings of these technologies by providing accurate estimates based on a large sample of high-frequency data and innovative energy efficiency evaluation framework. The energy efficiency industry requires customized energy efficiency solutions at the individual household level. Evaluation studies, bolstered by big data, can better evaluate the heterogeneity of the impacts of energy-efficient technologies by various factors (e.g., household characteristics, building attributes, technology attributes, and weather conditions) and thus can prioritize household-level energy efficiency investments and provide customized recommendations. Enabled by high-frequency data from smart meters, energy savings timing estimates enable deeper and more accurate analysis of the environmental and economic impacts of energy-efficient technologies.