1 Introduction

According to recent studies, about 40% of energy produced worldwide is consumed by buildings, and more than half of this is used by heating, ventilation and air conditioning (HVAC) systems [3, 8, 10]. Pan et al. [9] point out that, due to thermal inertia, it is more efficient to maintain temperature in a room or building than to raise or lower the temperature.

Accurate temperature forecasts can help reduce energy usage in buildings by using future values of temperature when deciding whether or not to activate the HVAC [22]. Moreno et al. [7] achieve estimated energy savings of 20% in a realistic situation based on the presence of persons in a room. Yuan et al. [20] achieve 20% savings while exploiting thermal inertia when assigning rooms for meetings by scheduling contiguous meetings in the same room. Prívara et al. [11] report 17–24% savings for a large university building, by using a model predictive controller.

Model predictive controllers (MPC) produce a control signal for HVAC systems, and minimize a cost function based on energy consumption. The cost function takes into account a prediction horizon and a control horizon [1]. The prediction horizon used in practise depends on how much data is needed by the HVAC controller to achieve acceptable comfort while reducing energy consumption.

While the costs savings may be significant, the overhead and operational costs associated with MPC may discourage adoption. These costs include the installation and maintenance of the sensing devices, a wireless sensor network, and the computational cost of modelling temperature as a function of the data generated by the sensors. To encourage wider adoption of MPC, in this paper we seek to reduce these associated costs. Specifically, we identify sensor data with little influence on forecast accuracy.

In the remainder of this paper, we review the sensor data related to temperature forecasting reported from a smart house. We discuss the nature of the search, provide a best-first search procedure to select sensors, and compare the outcomes as we vary the history horizon, forecast horizon and the error metric. We report on related work on this data, and conclude with recommendations for using our results in both new and existing installations.

This paper is a full exposition of initial work [16]. This presentation adds to the previous work in two main ways: It covers longer sensor histories and applies the technique to forecasts of temperature differences.

2 Background

2.1 Data from a smart home

The SML House [22] competed in the Solar Decathlon 2012 competition [19], using 88 sensors and 49 actuators. In this paper and in our previous work [14, 15], we use a publicly available subset of this data [18], reporting some of the collected data, specifically values collected during March and April 2012 from 18 sensors every quarter-hour.

The sensors reported are as follows:

  1. 1.

    Wi – wind speed

  2. 2.

    Tw – twilight indicator

  3. 3.

    TP – predicted temperature

  4. 4.

    TL – living room temperature

  5. 5.

    TD – dining room temperature

  6. 6.

    T – external temperature

  7. 7.

    SW – sun on the west wall

  8. 8.

    SS – sun on the south wall

  9. 9.

    SE – sun on the east wall

  10. 10.

    Pcp – precipitation

  11. 11.

    P – sun irradiance measured by a pyranometer

  12. 12.

    LL – lights in the living room

  13. 13.

    LD – lights in the dining room

  14. 14.

    HL – humidity in the living room

  15. 15.

    HD – humidity in the dining room

  16. 16.

    H – external humidity

  17. 17.

    CL – carbon dioxide sensor in the living room

  18. 18.

    CD – carbon dioxide sensor in the dining room

2.2 Linear and lasso regression

Our forecasting methods are based on linear regression defined as follows. Given a set of independent variables x1,..., xn and a dependent variable y of interest that we want to forecast, we seek parameters β0,..., βn so that \(\hat {y_{i}} = \beta _{0} + {\Sigma }_{i = 1}^{n} \, \beta _{i} x_{i}\) is a good approximation of y. When presented with a set of m instances of each xi, called xi, j and the corresponding instances yj, we select the βi parameters to minimize the residual sum of squares (RSS):

$${\Sigma}_{j = 1}^{m} (\beta_{0} + {\Sigma}_{i = 1}^{n} \, \beta_{i} x_{i,j} - y_{j})^{2} $$

Lasso regression [17] minimizes RSS + \(\lambda \,\, {\Sigma }_{j = 1}^{m} \lvert \, \beta _{j} \rvert \) where λ is a tuning parameter that balances the emphasis between reducing error and using small β coefficients, Some β may reduce to zero, which deselects that variable x, thus endowing lasso regression with feature selection. For lasso regression, we use the R library glmnet [4, 5, 13].

(We avoid using the ususal vector notation shorthand for linear regression so we can better relate this background with Section 3, which requires requires us to adjust the indexed of summation for a precise description.)

2.3 Feature selection

Having too many independent variables, or features, can confound a forecast model. Irrelevant details overwhelm the modelling technique, which prevents it from computing an accurate forecast. Feature selection, the process of selecting specific features from which to build a model, is roughly divided into wrapper techniques, filter methods and embedded methods. Wrapper techniques enumerate various combinations of features and measure the accuracy of the resulting models, selecting that combination that exhibits the best error. Filter techniques measure the usefulness of features using computationally fast metrics. Embedded techniques identify useful features during the modelling process as a by-product. Lasso regression is an example.

In this paper, we focus on wrapper techniques that are guided by best-first provided by the R library FSelector [12] and embedded techniques, using lasso regression.

3 Models using lagged sensor readings

When creating a model from which to forecast temperatures, we provide multiple historical readings from each sensor. Given a history of b time periods, where readings are taken every quarter-hour, we provide b + 1 lagged readings from each of s sensors, which includes the current period at lag 0. Let xk, t be the tth observation for sensor k counting from the first observation at time t = 1, as it appears in the training data. Let yt be the internal temperature the house at time t. We are given observations over the m time periods in the training data. We create a linear a model for each future period f. We define the RSS as

$$\text{RSS}(f)= {\Sigma}_{t=b + 1}^{m} (\beta_{f,0} + {\Sigma}_{g = 0}^{b} \, {\Sigma}_{k = 1}^{s} \, \beta_{f,k,g} x_{k,t-g} - y_{f+t})^{2} $$

In this equation, t starts at b + 1 because there are no observations for the lagged readings for the first b data points. Using lasso regression, we choose values for the coefficients βf = {βf,0}∪{βf, k, g | g = 0,…, b, k = 1,…, s} where g identifies the lag and k identifies the sensor. The coefficients in βf specify a model for each future interval f. We use two different forecast horizons; h is either 12 or 48 future time periods, i.e. 3 or 12 h.

The coefficients are computed on the training data which is the first 2/3 of the data. Once they are computed, we switch over to using test data, which is the final 1/3 of the data. Thus x and y below refer to observations in the test data and m to the number of observations in the test data. We report the RMSE for each future interval f. In our experiments f = 1,…,12 for forecasts 3 h into the future, and f = 1,…,48 for forecasts to 12 h.

$$\text{RMSE}(f) \,=\, \sqrt{1 / (m\,-\,b) \, {\Sigma}_{t=b + 1}^{m} (\beta_{f,0} \,+\, {\Sigma}_{g = 0}^{b} \, {\Sigma}_{k = 1}^{s} \, \beta_{f,k,g} x_{k,t-g} \,-\, y_{f+t})^{2}} $$

We report error metrics on all forecasts f over the forecast horizon h, including \(\text {Mean RMSE}= 1/h\,{\Sigma }_{f = 1}^{h} \text {RMSE}(f)\) and Maximal RMSE = maxf RMSE (f).

4 Useful and confounding sensors

In our experiments we consider various sets of sensors, and from each, we measure the error from a forecast model based on the data from those sensors. To measure error, with the exception of the selected set of sensors, we hold all other factors fixed, including the training and test data, the size h of the forecast horizon and the number b of back observations. Thus, the error from the model is a function only of the set of sensors.

It will often occur that one sensor in a set of sensors is useful in that it provides predictive power. Let S be a set of sensors and let a and b be individual sensors. We say a is useful in S when the error from S ∖{a} is greater than the error from S. If a is useful in S then aS. It may also occur that two sensors each provide that same predictive power, for instance when they report similar information. In this case we can use either one. More precisely, we say a and b are interchangeable in S when a and b are useful in S and the error from S ∖{a} is the same as the error from S ∖{b}.

The definitions in this paper are relative to some tolerance, below which forecast error is insignificant. We do not define this tolerance here, but note that it will be determined by the model predictive controller as follows: If an increase in the forecast error does not affect the controller’s ability to save energy, then that increase is below the tolerance. In this paper, we speak informally and understand an error to be greater than another when the difference exceeds this tolerance and likewise say that two errors are the same when their difference falls below this tolerance. In these experiments, since we are not measuring the performance of a controller, we take the tolerance to be 0.

Note that useful and interchangeable are defined with respect to a set of sensors. We may find that while a is useful in S ∖{b}, a is not useful in S ∪{b}. For instance, this will happen when a and b are interchangeable in S.

We may also observe that including a sensor in a model gives rise to a higher error. This can happen when the sensor leads us “down the garden path,” for instance, when it appears to be correlated to the observed temperature in the training data, but oppositely correlated in the test data. We say that aconfoundsS when the error from S ∖{a} is lower than the error from S ∪{a}.

We may also observe sensors that together increase accuracy but individually do not. This can happen when the model uses an interaction between the sensors. Suppose the laundry is always done on Saturday and no other day, and starts when someone enters the laundry room on Saturday. Suppose one of the sensors reports the day of the week and another reports motion in the laundry room. Then the modeller may recognize a heating event—for the room heats up when the laundry is done—occurs when both sensors are activated. In this case, if the modeller associated a heating event just based on motion in the laundry room, regardless of the day of the week, it would be misled on the non-Saturdays, and the model’s error would increase. Likewise, it would be misled by associating a heating event with Saturday for those weeks where no laundry was done. Thus, the laundry room motion sensor and the day of week sensor each individually confounds the model. However, together they improve the model. We say that two sensors a and b are co-dependent in S if individually each of a and b confound S ∖{a, b}, but the error of S ∪{a, b} is smaller than the error of S ∖{a, b}.

We seek a set S of sensors that has minimal error among the power set of sensors. This implies all sensors in S are useful, and that all sensors not in S confound S. We say that an ordered set of sensors gracefully degrades if we can remove one sensor at a time in that ordering, such that the error always increases. Given an ordered set of sensors that gracefully degrades, its gracefully degrading sequence of sets of sensors is that sequence that arises from removing sensors according to the ordering.

To guide the cost-benefit analysis, given a gracefully degrading set of sensors, we advocate computing two costs for each set of sensors in the gracefully degrading sequence. One cost is the measured energy costs which is higher when the error is higher. This always occurs with fewer sensors in the gracefully degrading sequence. The other cost arises from purchasing, installing, and maintaining the sensors, so it is lower when the number of sensors is lower.

5 Ordering sensors by influence

Our goal is to identify sensors that should be included in the model. Because we use lagged data in our model, each sensor provides many predictors in the regression, one for each quarter-hour of historical observations. For a given sensor, we may consider whether to include all of the predictors arising from this sensor, some of them, or none of them. This leads to a large search space. For instance, given one hour of lagged observations (plus the current observation) for each of 18 sensors, gives 5 × 18 = 90 predictors in the model. This gives rise to 290 sets of predictors, which is clearly infeasible to search entirely. We also want to consider 2 h of readings per sensor, but to avoid searching a space of 2162 sets of sensors, which would take us almost 1042 years to search if we could consider one set each second.

We rely on lasso regression, which selects features among the predictors in the regression. Since lasso feature selection is in place, we need only consider sensor selection so the search space is reduced from 290 to 218, and is independent of the number of lagged readings per sensor. A complete search would take about 3 days if we could consider one set per second. We simplify it further by employing best-first search, which is a variant of bottom-up search that limits non-deterministic choices and is guided by a heuristic. Our heuristic prefers lower forecast error. This search procedure and the heuristic are provided by FSelector. Since best-first search is well-known, we describe it here at a high level.

Available is the set of sets of sensors that may be considered during the running of the Algorithm 1.

Visited is the set of sets of sensors that is accumulated during the running of the algorithm. It contains all of the sets of sensors that were explored.

Algorithm 1 best-first search

Initially Visited is an empty set, and Available contains the empty set of sensors. The model for this empty set simply predicts the mean temperature. The search proceeds with a selection of S as the set from Available with lowest error. S is removed from Available. Nondeterministically, FSelector selects a new sensor a among the sensors in the SML house that do not occur in S. FSelector usually tries up to five different choices. For each of these choices for a, if a is useful in S ∪{a}, then S ∪{a} is added Available for future consideration. S ∪{a} is also added to Visited.

When there are no more sets in Available, the algorithm concludes. Among the sets in Visited, the set with minimal error, S, is taken as the estimate of S.

Because the heuristic guides the search toward the most promising parts of the search space, good estimates of S are expected. S is confounded by all sensors not in S, so it is a local minimum. However, the non-deterministic choices made by FSelector do not consider all possible choices. The search space is not entirely explored and S is not guaranteed to be a global minimum.

Given the sets that were visited by best-first search, we use a second algorithm to generate a sequence of these sets with gradually increasing error.

Algorithm 2 construct the sensor sequence

Let S1 = S, which is the set in Visited with lowest error. Let i = 1 and define Si+ 1 as the set with lowest error that is both a subset of Si and a visited set. Proceed to increment i and compute the next set S until Si is empty. Report the sequence of S’s and the sequence of set differences between them. In most cases the set differences will be individual sensors.

Because Algorithm 2 considers only visited sets, there is no error calculation required and Algorithm 2 is very efficient.

In the next section, we consider the effectiveness of this best-first search using the data of the SML house. There is no guarantee that Algorithm 1 will deliver the overall best set S. There is no guarantee that Algorithm 2 will generate the best sequence. However, Algorithm 1 and Algorithm 2 together a sequences that can be used to guide the cost-benefit analysis.

6 Experimental results

We investigate the sensor selection when forecasting temperature for different forecast horizons, different amounts of historical data and different error metrics. We also investigate sensor selection when forecasting temperature differences.

6.1 Forecasting temperature from one hour of sensor data

We ran experiments using four readings per sensor, shown in Tables 123 and 4. We varied the forecasting horizon to 3 and 12 h into the future.

Table 1 Maximal RMSE for selected sensors, generating 3-h forecasts of temperature, from 4 historical readings
Table 2 Mean RMSE for selected sensors, generating 3-h forecasts of temperature, from 4 historical readings
Table 3 Maximal RMSE for selected sensors, generating 12-h forecasts of temperature, from 4 historical readings
Table 4 Mean RMSE for selected sensors, generating 12-h forecasts of temperature, from 4 historical readings

Since we compute RMSE for each period, we have a series of RMSE values. We compute the mean and the maximal values of this series. Both maximal RMSE and mean RMSE are interesting here since the effectiveness of the model predictive controller can be affected in different ways. A large mean RMSE indicates errors in many of the quarter-hourly forecasts. As a result the MPC will consistently incur higher energy costs than necessary. A large maximal RMSE indicates at least one big forecast error. The inefficiency could be very great for that one period. Since each error metric indicates a separate kind of error, we report both.

Consider the example from Table 1, where we used four historical observations per sensor, generated 3-h forecasts, and measured maximal RMSE. Starting from the empty set, the search in Algorithm 1 considers sets up to about 10 sensors. Overall it visited 135 sets of sensors, which is a sharp reduction from the possible 218 = 262,144 sets. The minimal error occurs with nine sensors: Wi, Tw, TL, TD, T, SW, SE, CL and CD, so this is our estimate of S. The other nine sensors, namely TP, SS, Pcp, P, LL, LD, HL, HD and H, confounded it.

Using Algorithm 2, we progressively remove sensors from S to increase the error gradually. The error increases only by 0.0021 °C if we remove CL, the carbon dioxide sensor in the living room. Another small increase, 0.0028 °C, occurs if we ignore the carbon dioxide sensor in the dining room.

We observe some trends in the results. Maximal RMSE is larger than mean RMSE, but within a factor of about two. Forecasting 3 h into the future has lower error than forecasting 12 h, usually within a factor of 5. This difference is always less than 2°, when comparing best models short forecasts to best models for long forecasts. From these two observations we conclude having longer forecast horizons and more sensitive error calculations increase the measured error, which is consistent with intuition.

All of our tabular results degrade gracefully. This suggests that the model does not need to consider interactions between sensors, as we described in Section 4. It also means that sensors that do not appear in the model have been found to be not useful, according to our definition of useful.

There is some consensus, across the different forecast horizons, about which sensors are useful. No best model made use of H, HD, HL or LL, which are, respectively, the humidity externally, in the dining room and in the living room, and the lighting in the living room. Based on this analysis, we would not recommend installing these sensors in this house for the purpose of forecasting temperature.

We cannot identify sensors that are always the most useful, but there is some consistency. We find that T, TD, TP, LD, SW, SE and Tw appear frequently in the smaller sets of sensors. These are, respectively, the external temperature, temperature in the dining room, and the predicted temperature, the light in the dining room, the sun on the west and east wall, and the pyranometer which measures sunlight intensity. There is also some influence from the CO2 sensors. Thus we can conclude that the future temperature results from a combination of human activities, ambient internal conditions, external weather conditions and time of day.

6.2 Forecasting temperature from longer histories

Starting from the models computing from four historical sensor observations, we ran the same models using eight historical observations per sensor, as shown in Tables 5 and 6. The picture that emerges is similar to when we used four historical observations. The errors are not significantly different; they are sometimes smaller and sometimes larger. This indicates the extra hour of observations is not particularly helpful to the lasso regression model. Again, the maximal RMSE is about twice as large as the mean RMSE and so we report only the mean error. The same observations apply with regard to longer forecast horizons increasing the measured error. The most useful sensors are approximately the same for both 4- and 8-history models, although there is some variation in the apparent importance of each.

Table 5 Mean RMSE for selected sensors, generating 3-h forecasts of temperature, from 8 historical readings
Table 6 Mean RMSE for selected sensors, generating 12-h forecasts of temperature, from 8 historical readings

The best set of sensors is smaller when considering forecasts for longer period into the future. This suggests some factors have influence over the temperature for a brief period, but are less useful at later times. Others are useful over the entire period. For instance, TW and SW appear quite often in the smaller sets of sensors, especially when there are eight historical values available. This suggests that knowing these longer histories for these two sensors, in particular, is especially useful for making longer predictions. Factors that usually do not appear in the best twelve-hour forecast models include the predicted temperature, the lights and the CO2 sensors. The observation that these sensors have short term value for predictions agrees with intuition. The temperature predictions are probably not as accurate for 12 h into the future as they are for shorter periods. The lighting and CO2 sensors also provide information about activities that are not likely to have a long term effect on temperature. These sensors report on activities and movements of building occupants that most likely do not occur according to any schedule and have only a short term effect on the ambient temperature.

6.3 Forecasting temperature differences

Temperature controllers respond to changes in temperature, so some researchers investigate methods that forecast changes in temperature, such as the SML team [22].

Given a time sequence of temperatures y at each quarter-hour, we define zi = yiyi− 1 as the sequence of temperature differences over the previous quarter-hour. We set our goal to forecast \(\hat {z_{i}}\), and otherwise follow the same method that we used when generating the forecasts \(\hat {y_{i}}\). The results of selecting sensors for this forecasting problem are shown in this section.

Table 7 shows the errors when forecasting changes in temperature over the next 3 h, using one hour of historical observations, according to various sensors. Table 8 shows the same results but forecasting 12 h of temperature differences.

Table 7 Mean RMSE for selected sensors, generating 3-h forecasts of temperature, from 4 historical readings
Table 8 Mean RMSE for selected sensors, generating 12-h forecasts of temperature, from 4 historical readings

Our first observation is that forecasting temperature differences gives much smaller errors. When forecasting temperatures, we usually forecast numbers in the range of about 20°, whereas when forecasting temperature differences, we are forecasting numbers that are usually much less than 1°. Thus we should expect the errors to be much lower. For instance, it would not be informative in this house if we were to claim we could forecast that the temperature will not change by more than 1° over any quarter-hour, since it rarely does.

Now we consider slightly increasing the sensor history when forecasting differences in temperature. Table 9 shows the errors when forecasting changes in temperature over the next 3 h, using 2 h of historical observations, according to various sensors. Table 10 uses the same history length, but forecasts 12 h of temperature differences.

Table 9 Mean RMSE for selected sensors, generating 3-h forecasts of temperature changes, from 8 historical readings
Table 10 Mean RMSE for selected sensors, generating 12-h forecasts of temperature changes, from 8 historical readings

We notice that when forecasting temperature differences further into the future, our mean forecast error does not decrease as much as it does when forecasting temperature. Instead, the forecast errors are only about 40% greater when forecasting 12 h than they are when forecasting 3 h.

When forecasting temperature differences with more historical readings, that is, using 2 h instead of one hour, we see an effect similar to what we saw when forecasting temperatures. We see there is little improvement in mean RMSE gained by the extra available data.

6.4 Forecasting from much longer histories

Finally, we conducted two tests to further explore the effect of additional historical data readings from the sensors. In Section 6.2, we have shown that increasing from 1 to 2 h of sensor data did not improve forecast accuracy very much. In this section, we consider 12 h of sensor data.

Table 11 shows the results of computing 3 h of temperature forecast from 12 h of sensor data. The largest set of sensors has a mean RMSE of 0.05 °C. The selected sensors included are a combination of external conditions and ambient temperatures, while no importance is placed on activities of the occupants. This error is significantly less than the error reported on Tables 2 and 5, both of which are about 0.25 °C.

Table 11 Mean RMSE for selected sensors, generating 3-h forecasts of temperature, from 48 historical readings

Table 12 reports on the accuracy of computing changes in temperature over each 15-min interval in the next 48 h, using 12 h of historical readings. This forecast combines mostly environmental factors with some activities of the building occupants. Specifically it found that the twilight indicator, the sun on the walls, the atmospheric pressure and the lights in the dining room were most informative. For 12-h forecasts, the ambient room conditions were not found to be useful.

Table 12 Mean RMSE for selected sensors, generating 48-h forecasts of temperature changes from 48 historical readings

Table 12 shows an error that is surprisingly low for a 48 hour forecast. It approximately the same error as seen in the 3-h forecasts in Table 7. It is smaller than error for the 12-h forecast in Table 8, even though the forecast horizon in increased by a factor of four.

7 Related work

The SML team reports [21] accuracy when forecasting temperature differences over future quarter-hour intervals, using data from among the 88 sensors they installed. They selected from among this set: internal temperature (TD and TL), irradiance (P), internal humidity (HD and HL) and precipitation (PCP). Based on their results for forecasting over 3 h, a selection of three sensors gave the lowest errors: internal temperature, solar irradiance, and a time-categorical variable. Using these sensors, forecasts for each quarter-hour over 3 h were generated using a combination of forecast models based on ANNs. They achieve error of about 0.11 °C Mean Absolute Error. We report forecast with about half of the error for the same problem. See Table 11.

The SML team, in later work [22] explored the selection of sensors for forecasting temperature differences over each quarter-hour with a forecast horizon of 48 h. They report a maximal MAE of about 1 °C, although the accuracy was often much smaller.

In our investigations of this same data and the same 48 hour forecasting problem, we experimented with a naive forecast, computed by always forecasting the mean temperature difference and looking at no sensors at all. This gives a mean RMSE of 0.12 °C. This result also can be compared to our Table 6 which is based on the same data. These forecasts have mean MAE of about 0.06 °C, which is an order of magnitude smaller than the previously published error.

Feature extraction shares some similarities with feature selection. Feature extraction is the process of defining new features from existing ones, by selecting those features with good predictive accuracy, and repackaging them into linear combinations that are considered new features. Partial least squares and principal component analysis are two feature extraction techniques [2, 6].

We used the same SML data for partial least squares and principle components [15]. Using four historical readings per sensor, we found the RMSE forecast error for both methods to be about 0.7 3-h forecasts whereas the comparable mean RMSE values in this paper range from 0.24 to 0.34. Likewise for twelve-hour forecasts, the RMSE for the feature extraction methods was about 1.7 for twelve-hour forecasts, and ranged from 1.15 to 1.42 in this paper. The results were similar for eight historical readings per sensor. Thus, lasso regression and best-first search exhibit better forecast accuracy than these feature extraction methods for temperature forecasting.

8 Conclusion

A model predictive controller can achieve significant savings by using an accurate temperature forecast when determining whether or not to engage HVAC systems. Temperature forecasts are informed by sensor data. We propose a cost-benefit analysis that balances the cost arising from installation, operating and computation against the benefit of saving energy. A sensor’s cost exceeds its benefit if it does not improve forecast accuracy by an amount sufficient to be useful to the controller.

The method we describe generates accurate temperature forecasts using lasso regression. It uses a best-first search technique to incrementally consider larger sets of sensors until no additional sensor improves the forecast accuracy. It then reduces this set by removing sensors incrementally and reporting the resulting sequence of forecast errors. If we assume that energy savings increase with forecast accuracy, this sequence of sets of sensors should help finding the optimal set of sensors.

Our system computes a gracefully degrading set of sensors for different situations, depending on the length of the forecast horizon, the number of historical observations, and whether the controller performs better with a lower mean error or a lower maximal error. Our findings indicate that the selection of sensors will be affected by these factors. In a new installation, we propose to temporarily install a large set of sensors and to collect readings from these sensors over several weeks. Then, it should be possible to determine which sensors to permanently install. Alternately, in an existing installation, the maintenance and computation costs may be reduced by removing sensors that are not providing benefit. The same gracefully degrading sequence can guide this selection.

Our experiments show accuracy increases as more data is available for forecasting. Shorter-term forecasts are more accurate than longer-term forecasts, and derive benefit from more sensors than longer-term forecasts.

We have used lasso regression over lagged data as the underlying modelling technology. While the search technique we employ for selecting sensors can be applied to any underlying modelling technology, lasso regression has shown good performance. In a comparison with previously published forecasts based an artificial neural nets, the lasso forecasts show considerable improvement.

In the future, we plan to apply this proof of concept to a set of small university buildings with 12 sensors, and a model predictive controller.